In this thesis I present the development and the results of the project I carried out during my internship in Cerved Group. The focus of the project was to build a model able to estimate the turnover of small Italian companies that are not obliged by law to draft a financial report. The fact that this indicator is not available for these companies poses a serious problem when trying to automate tasks when they are involved. To estimate the model I was able to exploit the SIAM data set in order to retrieve the target variable: this is a database in which banks put financial report proxies of their client falling in these categories. The model had to satisfy some criteria of interpretability for business reasons and so I propose here some variants of linear models and a complex XGBoost model made interpretable via SHAP values. All the new models proposed significantly outperform the old model that is in production today with the XGBoost model achieving slightly higher performance. The proposed models have been judged very positively by the business side and one of them will be implemented in production after further diagnostic analysis.
In this thesis I present the development and the results of the project I carried out during my internship in Cerved Group. The focus of the project was to build a model able to estimate the turnover of small Italian companies that are not obliged by law to draft a financial report. The fact that this indicator is not available for these companies poses a serious problem when trying to automate tasks when they are involved. To estimate the model I was able to exploit the SIAM data set in order to retrieve the target variable: this is a database in which banks put financial report proxies of their client falling in these categories. The model had to satisfy some criteria of interpretability for business reasons and so I propose here some variants of linear models and a complex XGBoost model made interpretable via SHAP values. All the new models proposed significantly outperform the old model that is in production today with the XGBoost model achieving slightly higher performance. The proposed models have been judged very positively by the business side and one of them will be implemented in production after further diagnostic analysis.
Modelling of annual turnover of “Società di persone” and “Ditte individuali”: a case study based on SIAM data set
DILDA, GIORGIO
2021/2022
Abstract
In this thesis I present the development and the results of the project I carried out during my internship in Cerved Group. The focus of the project was to build a model able to estimate the turnover of small Italian companies that are not obliged by law to draft a financial report. The fact that this indicator is not available for these companies poses a serious problem when trying to automate tasks when they are involved. To estimate the model I was able to exploit the SIAM data set in order to retrieve the target variable: this is a database in which banks put financial report proxies of their client falling in these categories. The model had to satisfy some criteria of interpretability for business reasons and so I propose here some variants of linear models and a complex XGBoost model made interpretable via SHAP values. All the new models proposed significantly outperform the old model that is in production today with the XGBoost model achieving slightly higher performance. The proposed models have been judged very positively by the business side and one of them will be implemented in production after further diagnostic analysis.File | Dimensione | Formato | |
---|---|---|---|
Tesi_giorgio_dilda_completa (1).pdf
accesso aperto
Dimensione
1.23 MB
Formato
Adobe PDF
|
1.23 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/29703