Unified Approaches to Anti-Financial Crime: Data Integration and Machine Learning Models

This thesis presents a machine learning approach to fight financial crime in the context of banking. The study begins by introducing the banking system’s data management framework, known as the Data Mesh. This framework organizes and cleans the bank's data to create well-structured and clear dataframes, referred to as Data Products. A synthetic transaction dataset, analogous to a typical banking Data Product, is then analyzed to detect money laundering activities. The analysis employs various techniques, with a focus on supervised machine learning methods, as the dataset includes labels indicating whether a transaction is associated with money laundering. Initially, three models (Logistic Regression, Random Forest and XGBoost) are applied following just standard preprocessing. Subsequently, the dataset is balanced using two techniques: random undersampling and SMOTE, and the models are tested again under these conditions. In the final approach, the dataset is transformed into a directed graph, enabling the identification of money laundering patterns and the calculation of account-level statistics. These graph-based features enrich the dataset, enhancing the performance of the XGBoost model. After hyperparameter tuning, the optimized model demonstrates improved ability to identify money laundering transactions, achieving notable results.

Unified Approaches to Anti-Financial Crime: Data Integration and Machine Learning Models

BALDO, GABRIELE

2023/2024

Abstract

This thesis presents a machine learning approach to fight financial crime in the context of banking. The study begins by introducing the banking system’s data management framework, known as the Data Mesh. This framework organizes and cleans the bank's data to create well-structured and clear dataframes, referred to as Data Products. A synthetic transaction dataset, analogous to a typical banking Data Product, is then analyzed to detect money laundering activities. The analysis employs various techniques, with a focus on supervised machine learning methods, as the dataset includes labels indicating whether a transaction is associated with money laundering. Initially, three models (Logistic Regression, Random Forest and XGBoost) are applied following just standard preprocessing. Subsequently, the dataset is balanced using two techniques: random undersampling and SMOTE, and the models are tested again under these conditions. In the final approach, the dataset is transformed into a directed graph, enabling the identification of money laundering patterns and the calculation of account-level statistics. These graph-based features enrich the dataset, enhancing the performance of the XGBoost model. After hyperparameter tuning, the optimized model demonstrates improved ability to identify money laundering transactions, achieving notable results.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria Civile, Edile e Ambientale - ICEA
			
	Corso di studio
	
				MATHEMATICAL ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Unified Approaches to Anti-Financial Crime: Data Integration and Machine Learning Models
			
	Abstract in italiano
	
				This thesis presents a machine learning approach to fight financial crime in the context of banking. The study begins by introducing the banking system’s data management framework, known as the Data Mesh. This framework organizes and cleans the bank's data to create well-structured and clear dataframes, referred to as Data Products. A synthetic transaction dataset, analogous to a typical banking Data Product, is then analyzed to detect money laundering activities. The analysis employs various techniques, with a focus on supervised machine learning methods, as the dataset includes labels indicating whether a transaction is associated with money laundering. Initially, three models (Logistic Regression, Random Forest and XGBoost) are applied following just standard preprocessing. Subsequently, the dataset is balanced using two techniques: random undersampling and SMOTE, and the models are tested again under these conditions. In the final approach, the dataset is transformed into a directed graph, enabling the identification of money laundering patterns and the calculation of account-level statistics. These graph-based features enrich the dataset, enhancing the performance of the XGBoost model. After hyperparameter tuning, the optimized model demonstrates improved ability to identify money laundering transactions, achieving notable results.
			
	Parola chiave
	
				Anti-Financial Crime
Data Management
Analytical Models
			
	Relatore
	
				GRASSELLI, MARTINO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Baldo_Gabriele.pdf Accesso riservato Dimensione 5.82 MB Formato Adobe PDF	5.82 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/79855