Accurate sales forecasting in supply chain management remains is a central challenge that directly impacts inventory control, logistics efficiency, and strategic planning. In order to assess how well classic statistical, hybrid, and machine learning models predict actual, seasonal sales data from a large retail dataset, this thesis compares them using features like lagged sales, price, discount, and time-based indicators in a panel of ninety-six products observed over thirty three months. Four forecasting models were considered: Pooled Ordinary Least Squares (OLS), Prophet (a modular trend-seasonality-holiday model), Random Forest, and XGBoost. All models were trained and evaluated under a unified preprocessing and validation framework, using standardized metrics including RMSE, MAE, MAPE, and R². Random Forest consistently outperformed all other models, achieving the highest accuracy and lowest error rates, followed by XGBoost. Prophet demonstrated competitive performance in capturing seasonality but underperformed during promotion-driven volatility. Pooled OLS, while the least accurate, provided interpretable coefficients and served as a robust linear benchmark. The findings suggest that tree-based machine learning models are highly effective for forecasting in retail settings characterized by seasonality and promotional effects, provided that structured features and cross-sectional variance are properly modelled. The study also highlights key trade-offs between model complexity and interpretability, emphasizing the importance of aligning forecasting tools with operational context. Practical implications for model selection and deployment are discussed, along with limitations and recommendations for future research involving deep learning architectures, hierarchical forecasting, and explainable AI methods.

Accurate sales forecasting in supply chain management remains is a central challenge that directly impacts inventory control, logistics efficiency, and strategic planning. In order to assess how well classic statistical, hybrid, and machine learning models predict actual, seasonal sales data from a large retail dataset, this thesis compares them using features like lagged sales, price, discount, and time-based indicators in a panel of ninety-six products observed over thirty three months. Four forecasting models were considered: Pooled Ordinary Least Squares (OLS), Prophet (a modular trend-seasonality-holiday model), Random Forest, and XGBoost. All models were trained and evaluated under a unified preprocessing and validation framework, using standardized metrics including RMSE, MAE, MAPE, and R². Random Forest consistently outperformed all other models, achieving the highest accuracy and lowest error rates, followed by XGBoost. Prophet demonstrated competitive performance in capturing seasonality but underperformed during promotion-driven volatility. Pooled OLS, while the least accurate, provided interpretable coefficients and served as a robust linear benchmark. The findings suggest that tree-based machine learning models are highly effective for forecasting in retail settings characterized by seasonality and promotional effects, provided that structured features and cross-sectional variance are properly modelled. The study also highlights key trade-offs between model complexity and interpretability, emphasizing the importance of aligning forecasting tools with operational context. Practical implications for model selection and deployment are discussed, along with limitations and recommendations for future research involving deep learning architectures, hierarchical forecasting, and explainable AI methods.

Comparative Analysis of Machine Learning Models for Time Series Sales Forecasting in Supply Chain Management

BEIGNEZHAD, GHAZAL
2024/2025

Abstract

Accurate sales forecasting in supply chain management remains is a central challenge that directly impacts inventory control, logistics efficiency, and strategic planning. In order to assess how well classic statistical, hybrid, and machine learning models predict actual, seasonal sales data from a large retail dataset, this thesis compares them using features like lagged sales, price, discount, and time-based indicators in a panel of ninety-six products observed over thirty three months. Four forecasting models were considered: Pooled Ordinary Least Squares (OLS), Prophet (a modular trend-seasonality-holiday model), Random Forest, and XGBoost. All models were trained and evaluated under a unified preprocessing and validation framework, using standardized metrics including RMSE, MAE, MAPE, and R². Random Forest consistently outperformed all other models, achieving the highest accuracy and lowest error rates, followed by XGBoost. Prophet demonstrated competitive performance in capturing seasonality but underperformed during promotion-driven volatility. Pooled OLS, while the least accurate, provided interpretable coefficients and served as a robust linear benchmark. The findings suggest that tree-based machine learning models are highly effective for forecasting in retail settings characterized by seasonality and promotional effects, provided that structured features and cross-sectional variance are properly modelled. The study also highlights key trade-offs between model complexity and interpretability, emphasizing the importance of aligning forecasting tools with operational context. Practical implications for model selection and deployment are discussed, along with limitations and recommendations for future research involving deep learning architectures, hierarchical forecasting, and explainable AI methods.
2024
Comparative Analysis of Machine Learning Models for Time Series Sales Forecasting in Supply Chain Management
Accurate sales forecasting in supply chain management remains is a central challenge that directly impacts inventory control, logistics efficiency, and strategic planning. In order to assess how well classic statistical, hybrid, and machine learning models predict actual, seasonal sales data from a large retail dataset, this thesis compares them using features like lagged sales, price, discount, and time-based indicators in a panel of ninety-six products observed over thirty three months. Four forecasting models were considered: Pooled Ordinary Least Squares (OLS), Prophet (a modular trend-seasonality-holiday model), Random Forest, and XGBoost. All models were trained and evaluated under a unified preprocessing and validation framework, using standardized metrics including RMSE, MAE, MAPE, and R². Random Forest consistently outperformed all other models, achieving the highest accuracy and lowest error rates, followed by XGBoost. Prophet demonstrated competitive performance in capturing seasonality but underperformed during promotion-driven volatility. Pooled OLS, while the least accurate, provided interpretable coefficients and served as a robust linear benchmark. The findings suggest that tree-based machine learning models are highly effective for forecasting in retail settings characterized by seasonality and promotional effects, provided that structured features and cross-sectional variance are properly modelled. The study also highlights key trade-offs between model complexity and interpretability, emphasizing the importance of aligning forecasting tools with operational context. Practical implications for model selection and deployment are discussed, along with limitations and recommendations for future research involving deep learning architectures, hierarchical forecasting, and explainable AI methods.
Forecasting
Time Series
Supply Chain
Machine Learning
Sales Forecasting
File in questo prodotto:
File Dimensione Formato  
Ghazal Beignezhad.pdf

accesso aperto

Dimensione 1.71 MB
Formato Adobe PDF
1.71 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89524