Performance estimation methods for diagnosing model quality degradation under covariate shift

This study explores the underexplored domain of performance degradation in machine learn- ing models, a common challenge in dynamic data environments. While the academic community often prioritizes the development of new training methods to enhance model performance on benchmark datasets, little attention has been given to sustaining those high levels of performance post-deployment or identifying when performance starts to decade. In a rapidly evolving data landscape, where covariate shifts can significantly impact model performance, understanding and mitigating performance degradation issues becomes essential. To tackle this issue, this study introduces three comprehensive tests. The Temporal Degradation Test examines how various models perform when trained on different samples of the same dataset, shedding light on degradation patterns. The Continuous Retraining Test simulates a production environment by assessing the impact of continuous model retraining. Finally, the Performance Estimation Test explores the potential of performance estimation methods, such as Direct Loss Estimation (DLE), to identify degradation without ground truth data. Our findings reveal diverse degradation patterns influenced by machine learning methodologies, with continuous retraining offering partial relief but not complete resolution. Performance estimation methods emerge as vital early warning systems, enabling timely interventions to maintain model efficacy.

Performance estimation methods for diagnosing model quality degradation under covariate shift

VIQUEZ SEGURA, SANTIAGO

2022/2023

Abstract

This study explores the underexplored domain of performance degradation in machine learn- ing models, a common challenge in dynamic data environments. While the academic community often prioritizes the development of new training methods to enhance model performance on benchmark datasets, little attention has been given to sustaining those high levels of performance post-deployment or identifying when performance starts to decade. In a rapidly evolving data landscape, where covariate shifts can significantly impact model performance, understanding and mitigating performance degradation issues becomes essential. To tackle this issue, this study introduces three comprehensive tests. The Temporal Degradation Test examines how various models perform when trained on different samples of the same dataset, shedding light on degradation patterns. The Continuous Retraining Test simulates a production environment by assessing the impact of continuous model retraining. Finally, the Performance Estimation Test explores the potential of performance estimation methods, such as Direct Loss Estimation (DLE), to identify degradation without ground truth data. Our findings reveal diverse degradation patterns influenced by machine learning methodologies, with continuous retraining offering partial relief but not complete resolution. Performance estimation methods emerge as vital early warning systems, enabling timely interventions to maintain model efficacy.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Performance estimation methods for diagnosing model quality degradation under covariate shift
			
	Abstract in italiano
	
				This study explores the underexplored domain of performance degradation in machine learn- ing models, a common challenge in dynamic data environments. While the academic community often prioritizes the development of new training methods to enhance model performance on benchmark datasets, little attention has been given to sustaining those high levels of performance post-deployment or identifying when performance starts to decade. In a rapidly evolving data landscape, where covariate shifts can significantly impact model performance, understanding and mitigating performance degradation issues becomes essential.

To tackle this issue, this study introduces three comprehensive tests. The Temporal Degradation Test examines how various models perform when trained on different samples of the same dataset, shedding light on degradation patterns. The Continuous Retraining Test simulates a production environment by assessing the impact of continuous model retraining. Finally, the Performance Estimation Test explores the potential of performance estimation methods, such as Direct Loss Estimation (DLE), to identify degradation without ground truth data. Our findings reveal diverse degradation patterns influenced by machine learning methodologies, with continuous retraining offering partial relief but not complete resolution. Performance estimation methods emerge as vital early warning systems, enabling timely interventions to maintain model efficacy.
			
	Parola chiave
	
				Machine learning
Data Science
Covariate shift
Performance
Estimantion
			
	Relatore
	
				SCARPA, BRUNO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
viquez_segura.pdf accesso aperto Dimensione 12.79 MB Formato Adobe PDF Visualizza/Apri	12.79 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/52281