The prediction of corporate defaults represents a crucial aspect of the financial landscape, which has consequences both for firms and for the economic system as a whole. This study analyses the evolution of classification models, highlighting the shift from classical expert-based models to data-intensive machine learning (ML). Although ML models systematically outperform classic methods in both terms of accuracy and flexibility, their adoption in high-stake environments is limited by the "black-box" problem; that is, the difficulty of interpreting models' complex non-linear interactions among numerous variables. The aforementioned opaqueness raises concerns in the context of default prediction, where errors are costly, and stakeholders demand explanations. To deal with this problem, the state-of-the-art models of explainable artificial intelligence (XAI), which aim to improve transparency of ML models without sacrificing discriminating power, are illustrated. Using data from European SMEs in the industrial sector, the performances of the most popular classification algorithms are compared. These algorithms include logistic regression, support vector machine, ensemble models, and neural networks. Thereafter, the most widely recognised XAI techniques, such as H-statistic, accumulated local effect, global surrogates, LIME and SHAP have been used to peek inside the black-box, quantifying the impact of variables on predictions and justifying independent decisions. The results demonstrated the superiority of Gradient Boosting algorithms. XAI methods provided a detailed and transparent understanding of the inner processes of the models, also revealing areas for refining the input data or model architecture. These findings underscore the potential for integrating XAI into decision-making frameworks, allowing stakeholders and policy makers to build trust in ML systems, mitigate risks, and make informed financial decisions with greater confidence.
The prediction of corporate defaults represents a crucial aspect of the financial landscape, which has consequences both for firms and for the economic system as a whole. This study analyses the evolution of classification models, highlighting the shift from classical expert-based models to data-intensive machine learning (ML). Although ML models systematically outperform classic methods in both terms of accuracy and flexibility, their adoption in high-stake environments is limited by the "black-box" problem; that is, the difficulty of interpreting models' complex non-linear interactions among numerous variables. The aforementioned opaqueness raises concerns in the context of default prediction, where errors are costly, and stakeholders demand explanations. To deal with this problem, the state-of-the-art models of explainable artificial intelligence (XAI), which aim to improve transparency of ML models without sacrificing discriminating power, are illustrated. Using data from European SMEs in the industrial sector, the performances of the most popular classification algorithms are compared. These algorithms include logistic regression, support vector machine, ensemble models, and neural networks. Thereafter, the most widely recognised XAI techniques, such as H-statistic, accumulated local effect, global surrogates, LIME and SHAP have been used to peek inside the black-box, quantifying the impact of variables on predictions and justifying independent decisions. The results demonstrated the superiority of Gradient Boosting algorithms. XAI methods provided a detailed and transparent understanding of the inner processes of the models, also revealing areas for refining the input data or model architecture. These findings underscore the potential for integrating XAI into decision-making frameworks, allowing stakeholders and policy makers to build trust in ML systems, mitigate risks, and make informed financial decisions with greater confidence.
Predicting Bankruptcy in Industrial Sectors: A Study of Machine Learning Models Performance and Explainability
GISONNA, SIMONE
2024/2025
Abstract
The prediction of corporate defaults represents a crucial aspect of the financial landscape, which has consequences both for firms and for the economic system as a whole. This study analyses the evolution of classification models, highlighting the shift from classical expert-based models to data-intensive machine learning (ML). Although ML models systematically outperform classic methods in both terms of accuracy and flexibility, their adoption in high-stake environments is limited by the "black-box" problem; that is, the difficulty of interpreting models' complex non-linear interactions among numerous variables. The aforementioned opaqueness raises concerns in the context of default prediction, where errors are costly, and stakeholders demand explanations. To deal with this problem, the state-of-the-art models of explainable artificial intelligence (XAI), which aim to improve transparency of ML models without sacrificing discriminating power, are illustrated. Using data from European SMEs in the industrial sector, the performances of the most popular classification algorithms are compared. These algorithms include logistic regression, support vector machine, ensemble models, and neural networks. Thereafter, the most widely recognised XAI techniques, such as H-statistic, accumulated local effect, global surrogates, LIME and SHAP have been used to peek inside the black-box, quantifying the impact of variables on predictions and justifying independent decisions. The results demonstrated the superiority of Gradient Boosting algorithms. XAI methods provided a detailed and transparent understanding of the inner processes of the models, also revealing areas for refining the input data or model architecture. These findings underscore the potential for integrating XAI into decision-making frameworks, allowing stakeholders and policy makers to build trust in ML systems, mitigate risks, and make informed financial decisions with greater confidence.File | Dimensione | Formato | |
---|---|---|---|
Tesi_Convertita_2.pdf
accesso aperto
Dimensione
4.44 MB
Formato
Adobe PDF
|
4.44 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/83083