This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
Dataset shift is a phenomenon in machine learning and statistics in which the joint distribution of input variables and target labels is different in the training phase and the deployment or test phase (i.e., ).[1][2][3] This happens when the statistical properties of data used to train a model are no longer representative of the data encountered in real-world use, often resulting in degraded predictive performance and diminished generalization ability.[4][5]
Dataset shift is a generic term for a number of particular types of distributional change. Covariate shift is when the distribution of the input features changes, but the conditional relationship between inputs and outputs remains constant .[6][7] Prior probability shift (or label shift) happens when the distribution of target labels changes, but the conditional distribution of inputs given labels stays the same.[8][9] Concept shift (also known as concept drift) is the change of the conditional relationship between inputs and outputs that renders previously learned patterns invalid over time.[10][circular reference]
A key challenge for deploying machine learning systems is dataset shift, in particular in dynamic environments where the data distributions change over time. Detecting and mitigating such shifts is an active area of research, e.g., drift detection, domain adaptation, continual learning.[11]
See also
editReferences
edit- ↑ Moreno-Torres, José G. (2012). "A unifying view on dataset shift in classification". Pattern Recognition. 45 (1): 521–530. doi:10.1016/j.patcog.2011.06.019.
- ↑ Quiñonero-Candela, Joaquin, ed. (2010). Dataset shift in machine learning. Neural information processing series. Cambridge, Mass: MIT Press. ISBN 978-0-262-17005-5.
- ↑ "Dataset shift". Neural Network Lexicon. Retrieved 28 April 2026.
- ↑ Kumar, Rajesh. "What is dataset shift?". AIOpsSchool. Retrieved 28 April 2026.
- ↑ Bayram, Firas; Ahmed, Bestoun S.; Kassler, Andreas (7 June 2022). "From concept drift to model degradation: An overview on performance-aware drift detectors". Knowledge-Based Systems. 245 108632. doi:10.1016/j.knosys.2022.108632. ISSN 0950-7051.
- ↑ Shimodaira, Hidetoshi (1 October 2000). "Improving predictive inference under covariate shift by weighting the log-likelihood function". Journal of Statistical Planning and Inference. 90 (2): 227–244. doi:10.1016/S0378-3758(00)00115-4. ISSN 0378-3758.
- ↑ Raitoharju, Jenni (1 January 2022), "Convolutional neural networks", Deep Learning for Robot Perception and Cognition, Academic Press, pp. 35–69, doi:10.1016/B978-0-32-385787-1.00008-7, ISBN 978-0-323-85787-1, retrieved 28 April 2026
{{citation}}: CS1 maint: work parameter with ISBN (link) - ↑ "Dataset shift explanation". Retrieved 28 April 2026.
- ↑ Huyen, Chip (2022). Designing machine learning systems: an iterative process for production-ready applications (1st ed.). Sebastopol, CA: O'Reilly Media, Inc. ISBN 978-1-0981-0796-3.
- ↑ Silva, Gabriel Ferreira dos Santos; Barcellos Filho, Fabiano Novaes; Wichmann, Roberta Moreira; da Silva Junior, Francisco Costa; Chiavegatto Filho, Alexandre Dias Porto (1 October 2025). "Strategies for detecting and mitigating dataset shift in machine learning for health predictions: A systematic review". Journal of Biomedical Informatics. 170 104902. doi:10.1016/j.jbi.2025.104902. ISSN 1532-0464. PMID 40876698.
- ↑ "Drift in machine learning". Retrieved 28 April 2026.