Modelling of annual turnover of “Società di persone” and “Ditte individuali”: a case study based on SIAM data set

In this thesis I present the development and the results of the project I carried out during my internship in Cerved Group. The focus of the project was to build a model able to estimate the turnover of small Italian companies that are not obliged by law to draft a financial report. The fact that this indicator is not available for these companies poses a serious problem when trying to automate tasks when they are involved. To estimate the model I was able to exploit the SIAM data set in order to retrieve the target variable: this is a database in which banks put financial report proxies of their client falling in these categories. The model had to satisfy some criteria of interpretability for business reasons and so I propose here some variants of linear models and a complex XGBoost model made interpretable via SHAP values. All the new models proposed significantly outperform the old model that is in production today with the XGBoost model achieving slightly higher performance. The proposed models have been judged very positively by the business side and one of them will be implemented in production after further diagnostic analysis.

Modelling of annual turnover of “Società di persone” and “Ditte individuali”: a case study based on SIAM data set

DILDA, GIORGIO

2021/2022

Abstract

In this thesis I present the development and the results of the project I carried out during my internship in Cerved Group. The focus of the project was to build a model able to estimate the turnover of small Italian companies that are not obliged by law to draft a financial report. The fact that this indicator is not available for these companies poses a serious problem when trying to automate tasks when they are involved. To estimate the model I was able to exploit the SIAM data set in order to retrieve the target variable: this is a database in which banks put financial report proxies of their client falling in these categories. The model had to satisfy some criteria of interpretability for business reasons and so I propose here some variants of linear models and a complex XGBoost model made interpretable via SHAP values. All the new models proposed significantly outperform the old model that is in production today with the XGBoost model achieving slightly higher performance. The proposed models have been judged very positively by the business side and one of them will be implemented in production after further diagnostic analysis.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2021
			
	Titolo inglese
	
				Modelling of annual turnover of “Società di persone” and “Ditte individuali”: a case study based on SIAM data set
			
	Abstract in italiano
	
				In this thesis I present the development and the results of the project I carried out during my internship in Cerved Group.
The focus of the project was to build a model able to estimate the turnover of small Italian companies that are not obliged by law to draft a financial report.
The fact that this indicator is not available for these companies poses a serious problem when trying to automate tasks when they are involved.
To estimate  the model I was able to exploit the SIAM data set in order to retrieve the target variable: this is a database in which banks put financial report proxies of their client falling in these categories.
The model had to satisfy some criteria of interpretability for business reasons and so I propose here some variants of linear models and a complex XGBoost model made interpretable via SHAP values.
All the new models proposed significantly outperform the old model that is in production today with the XGBoost model achieving slightly higher performance.
The proposed models have been judged very positively by the business side and one of them will be implemented in production after further diagnostic analysis.
			
	Parola chiave
	
				Turnover
Machine Learning
Prediction
			
	Relatore
	
				GUIDOLIN, MARIANGELA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Tesi_giorgio_dilda_completa (1).pdf accesso aperto Dimensione 1.23 MB Formato Adobe PDF Visualizza/Apri	1.23 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/29703