In this thesis I present the development and the results of the project I carried out during my internship in Cerved Group. The focus of the project was to build a model able to estimate the turnover of small Italian companies that are not obliged by law to draft a financial report. The fact that this indicator is not available for these companies poses a serious problem when trying to automate tasks when they are involved. To estimate the model I was able to exploit the SIAM data set in order to retrieve the target variable: this is a database in which banks put financial report proxies of their client falling in these categories. The model had to satisfy some criteria of interpretability for business reasons and so I propose here some variants of linear models and a complex XGBoost model made interpretable via SHAP values. All the new models proposed significantly outperform the old model that is in production today with the XGBoost model achieving slightly higher performance. The proposed models have been judged very positively by the business side and one of them will be implemented in production after further diagnostic analysis.

In this thesis I present the development and the results of the project I carried out during my internship in Cerved Group. The focus of the project was to build a model able to estimate the turnover of small Italian companies that are not obliged by law to draft a financial report. The fact that this indicator is not available for these companies poses a serious problem when trying to automate tasks when they are involved. To estimate the model I was able to exploit the SIAM data set in order to retrieve the target variable: this is a database in which banks put financial report proxies of their client falling in these categories. The model had to satisfy some criteria of interpretability for business reasons and so I propose here some variants of linear models and a complex XGBoost model made interpretable via SHAP values. All the new models proposed significantly outperform the old model that is in production today with the XGBoost model achieving slightly higher performance. The proposed models have been judged very positively by the business side and one of them will be implemented in production after further diagnostic analysis.

Modelling of annual turnover of “Società di persone” and “Ditte individuali”: a case study based on SIAM data set

DILDA, GIORGIO
2021/2022

Abstract

In this thesis I present the development and the results of the project I carried out during my internship in Cerved Group. The focus of the project was to build a model able to estimate the turnover of small Italian companies that are not obliged by law to draft a financial report. The fact that this indicator is not available for these companies poses a serious problem when trying to automate tasks when they are involved. To estimate the model I was able to exploit the SIAM data set in order to retrieve the target variable: this is a database in which banks put financial report proxies of their client falling in these categories. The model had to satisfy some criteria of interpretability for business reasons and so I propose here some variants of linear models and a complex XGBoost model made interpretable via SHAP values. All the new models proposed significantly outperform the old model that is in production today with the XGBoost model achieving slightly higher performance. The proposed models have been judged very positively by the business side and one of them will be implemented in production after further diagnostic analysis.
2021
Modelling of annual turnover of “Società di persone” and “Ditte individuali”: a case study based on SIAM data set
In this thesis I present the development and the results of the project I carried out during my internship in Cerved Group. The focus of the project was to build a model able to estimate the turnover of small Italian companies that are not obliged by law to draft a financial report. The fact that this indicator is not available for these companies poses a serious problem when trying to automate tasks when they are involved. To estimate the model I was able to exploit the SIAM data set in order to retrieve the target variable: this is a database in which banks put financial report proxies of their client falling in these categories. The model had to satisfy some criteria of interpretability for business reasons and so I propose here some variants of linear models and a complex XGBoost model made interpretable via SHAP values. All the new models proposed significantly outperform the old model that is in production today with the XGBoost model achieving slightly higher performance. The proposed models have been judged very positively by the business side and one of them will be implemented in production after further diagnostic analysis.
Turnover
Machine Learning
Prediction
File in questo prodotto:
File Dimensione Formato  
Tesi_giorgio_dilda_completa (1).pdf

accesso aperto

Dimensione 1.23 MB
Formato Adobe PDF
1.23 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/29703