ROBUST DATA SELECTION AND OVERFITTING FOR INTRADAY TRADING WITH MACHINE LEARNING

Overfitting remains one of the most important obstacles in applying Machine Learning techniques to algorithmic trading, especially using high-frequency data. While recent research proves that data selection mitigates this issue, empirical applications often lack robust statistical tools to quantify overfitting risk. This paper extends the analysis by combining a systematic data selection framework with modern overfitting diagnostics, including Purged Cross-Validation, the Probability of Backtest Overfitting, and the deflated Sharpe Ratio, and several machine learning models. Using one-minute Foreign Exchange data across multiple pairs and market regimes, we evaluate how choices of data source, sampling frequency, machine learning model and market instrument impact both predictive accuracy and robustness. Results show that apparent profitability in-sample collapses out-of-sample even if data are carefully selected and validated with stringent statistical tests. None of the selected strategies remain profitable once we apply robustness diagnostics and realistic trading costs. Through this paper we propose a reproducible methodological pipeline that researchers and practitioners can adopt to design more reliable trading strategies.

ROBUST DATA SELECTION AND OVERFITTING FOR INTRADAY TRADING WITH MACHINE LEARNING

BERTO, ENRICO

2024/2025

Abstract

Overfitting remains one of the most important obstacles in applying Machine Learning techniques to algorithmic trading, especially using high-frequency data. While recent research proves that data selection mitigates this issue, empirical applications often lack robust statistical tools to quantify overfitting risk. This paper extends the analysis by combining a systematic data selection framework with modern overfitting diagnostics, including Purged Cross-Validation, the Probability of Backtest Overfitting, and the deflated Sharpe Ratio, and several machine learning models. Using one-minute Foreign Exchange data across multiple pairs and market regimes, we evaluate how choices of data source, sampling frequency, machine learning model and market instrument impact both predictive accuracy and robustness. Results show that apparent profitability in-sample collapses out-of-sample even if data are carefully selected and validated with stringent statistical tests. None of the selected strategies remain profitable once we apply robustness diagnostics and realistic trading costs. Through this paper we propose a reproducible methodological pipeline that researchers and practitioners can adopt to design more reliable trading strategies.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				COMPUTATIONAL FINANCE  Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				ROBUST DATA SELECTION AND OVERFITTING FOR INTRADAY TRADING WITH MACHINE LEARNING
			
	Abstract in italiano
	
				Overfitting remains one of the most important obstacles in applying Machine Learning techniques to algorithmic trading, especially using high-frequency data. While recent research proves that data selection mitigates this issue, empirical applications often lack robust statistical tools to quantify overfitting risk. This paper extends the analysis by combining a systematic data selection framework with modern overfitting diagnostics, including Purged Cross-Validation, the Probability of Backtest Overfitting, and the deflated Sharpe Ratio, and several machine learning models. Using one-minute Foreign Exchange data across multiple pairs and market regimes, we evaluate how choices of data source, sampling frequency, machine learning model and market instrument impact both predictive accuracy and robustness. Results show that apparent profitability in-sample collapses out-of-sample even if data are carefully selected and validated with stringent statistical tests. None of the selected strategies remain profitable once we apply robustness diagnostics and realistic trading costs. Through this paper we propose a reproducible methodological pipeline that researchers and practitioners can adopt to design more reliable trading strategies.
			
	Parola chiave
	
				Machine learning
Algorithmic trading
Foreign exchange
Overfitting
Trading profitabilit
			
	Relatore
	
				CAPORIN, MASSIMILIANO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Berto_Enrico.pdf Accesso riservato Dimensione 2.55 MB Formato Adobe PDF	2.55 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/101977