This thesis presents a machine learning approach to fight financial crime in the context of banking. The study begins by introducing the banking system’s data management framework, known as the Data Mesh. This framework organizes and cleans the bank's data to create well-structured and clear dataframes, referred to as Data Products. A synthetic transaction dataset, analogous to a typical banking Data Product, is then analyzed to detect money laundering activities. The analysis employs various techniques, with a focus on supervised machine learning methods, as the dataset includes labels indicating whether a transaction is associated with money laundering. Initially, three models (Logistic Regression, Random Forest and XGBoost) are applied following just standard preprocessing. Subsequently, the dataset is balanced using two techniques: random undersampling and SMOTE, and the models are tested again under these conditions. In the final approach, the dataset is transformed into a directed graph, enabling the identification of money laundering patterns and the calculation of account-level statistics. These graph-based features enrich the dataset, enhancing the performance of the XGBoost model. After hyperparameter tuning, the optimized model demonstrates improved ability to identify money laundering transactions, achieving notable results.

This thesis presents a machine learning approach to fight financial crime in the context of banking. The study begins by introducing the banking system’s data management framework, known as the Data Mesh. This framework organizes and cleans the bank's data to create well-structured and clear dataframes, referred to as Data Products. A synthetic transaction dataset, analogous to a typical banking Data Product, is then analyzed to detect money laundering activities. The analysis employs various techniques, with a focus on supervised machine learning methods, as the dataset includes labels indicating whether a transaction is associated with money laundering. Initially, three models (Logistic Regression, Random Forest and XGBoost) are applied following just standard preprocessing. Subsequently, the dataset is balanced using two techniques: random undersampling and SMOTE, and the models are tested again under these conditions. In the final approach, the dataset is transformed into a directed graph, enabling the identification of money laundering patterns and the calculation of account-level statistics. These graph-based features enrich the dataset, enhancing the performance of the XGBoost model. After hyperparameter tuning, the optimized model demonstrates improved ability to identify money laundering transactions, achieving notable results.

Unified Approaches to Anti-Financial Crime: Data Integration and Machine Learning Models

BALDO, GABRIELE
2023/2024

Abstract

This thesis presents a machine learning approach to fight financial crime in the context of banking. The study begins by introducing the banking system’s data management framework, known as the Data Mesh. This framework organizes and cleans the bank's data to create well-structured and clear dataframes, referred to as Data Products. A synthetic transaction dataset, analogous to a typical banking Data Product, is then analyzed to detect money laundering activities. The analysis employs various techniques, with a focus on supervised machine learning methods, as the dataset includes labels indicating whether a transaction is associated with money laundering. Initially, three models (Logistic Regression, Random Forest and XGBoost) are applied following just standard preprocessing. Subsequently, the dataset is balanced using two techniques: random undersampling and SMOTE, and the models are tested again under these conditions. In the final approach, the dataset is transformed into a directed graph, enabling the identification of money laundering patterns and the calculation of account-level statistics. These graph-based features enrich the dataset, enhancing the performance of the XGBoost model. After hyperparameter tuning, the optimized model demonstrates improved ability to identify money laundering transactions, achieving notable results.
2023
Unified Approaches to Anti-Financial Crime: Data Integration and Machine Learning Models
This thesis presents a machine learning approach to fight financial crime in the context of banking. The study begins by introducing the banking system’s data management framework, known as the Data Mesh. This framework organizes and cleans the bank's data to create well-structured and clear dataframes, referred to as Data Products. A synthetic transaction dataset, analogous to a typical banking Data Product, is then analyzed to detect money laundering activities. The analysis employs various techniques, with a focus on supervised machine learning methods, as the dataset includes labels indicating whether a transaction is associated with money laundering. Initially, three models (Logistic Regression, Random Forest and XGBoost) are applied following just standard preprocessing. Subsequently, the dataset is balanced using two techniques: random undersampling and SMOTE, and the models are tested again under these conditions. In the final approach, the dataset is transformed into a directed graph, enabling the identification of money laundering patterns and the calculation of account-level statistics. These graph-based features enrich the dataset, enhancing the performance of the XGBoost model. After hyperparameter tuning, the optimized model demonstrates improved ability to identify money laundering transactions, achieving notable results.
Anti-Financial Crime
Data Management
Analytical Models
File in questo prodotto:
File Dimensione Formato  
Baldo_Gabriele.pdf

accesso riservato

Dimensione 5.82 MB
Formato Adobe PDF
5.82 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/79855