This thesis presents a machine learning approach to fight financial crime in the context of banking. The study begins by introducing the banking system’s data management framework, known as the Data Mesh. This framework organizes and cleans the bank's data to create well-structured and clear dataframes, referred to as Data Products. A synthetic transaction dataset, analogous to a typical banking Data Product, is then analyzed to detect money laundering activities. The analysis employs various techniques, with a focus on supervised machine learning methods, as the dataset includes labels indicating whether a transaction is associated with money laundering. Initially, three models (Logistic Regression, Random Forest and XGBoost) are applied following just standard preprocessing. Subsequently, the dataset is balanced using two techniques: random undersampling and SMOTE, and the models are tested again under these conditions. In the final approach, the dataset is transformed into a directed graph, enabling the identification of money laundering patterns and the calculation of account-level statistics. These graph-based features enrich the dataset, enhancing the performance of the XGBoost model. After hyperparameter tuning, the optimized model demonstrates improved ability to identify money laundering transactions, achieving notable results.
This thesis presents a machine learning approach to fight financial crime in the context of banking. The study begins by introducing the banking system’s data management framework, known as the Data Mesh. This framework organizes and cleans the bank's data to create well-structured and clear dataframes, referred to as Data Products. A synthetic transaction dataset, analogous to a typical banking Data Product, is then analyzed to detect money laundering activities. The analysis employs various techniques, with a focus on supervised machine learning methods, as the dataset includes labels indicating whether a transaction is associated with money laundering. Initially, three models (Logistic Regression, Random Forest and XGBoost) are applied following just standard preprocessing. Subsequently, the dataset is balanced using two techniques: random undersampling and SMOTE, and the models are tested again under these conditions. In the final approach, the dataset is transformed into a directed graph, enabling the identification of money laundering patterns and the calculation of account-level statistics. These graph-based features enrich the dataset, enhancing the performance of the XGBoost model. After hyperparameter tuning, the optimized model demonstrates improved ability to identify money laundering transactions, achieving notable results.
Unified Approaches to Anti-Financial Crime: Data Integration and Machine Learning Models
BALDO, GABRIELE
2023/2024
Abstract
This thesis presents a machine learning approach to fight financial crime in the context of banking. The study begins by introducing the banking system’s data management framework, known as the Data Mesh. This framework organizes and cleans the bank's data to create well-structured and clear dataframes, referred to as Data Products. A synthetic transaction dataset, analogous to a typical banking Data Product, is then analyzed to detect money laundering activities. The analysis employs various techniques, with a focus on supervised machine learning methods, as the dataset includes labels indicating whether a transaction is associated with money laundering. Initially, three models (Logistic Regression, Random Forest and XGBoost) are applied following just standard preprocessing. Subsequently, the dataset is balanced using two techniques: random undersampling and SMOTE, and the models are tested again under these conditions. In the final approach, the dataset is transformed into a directed graph, enabling the identification of money laundering patterns and the calculation of account-level statistics. These graph-based features enrich the dataset, enhancing the performance of the XGBoost model. After hyperparameter tuning, the optimized model demonstrates improved ability to identify money laundering transactions, achieving notable results.File | Dimensione | Formato | |
---|---|---|---|
Baldo_Gabriele.pdf
accesso riservato
Dimensione
5.82 MB
Formato
Adobe PDF
|
5.82 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/79855