Feature-Enhanced Graph Neural Networks for Automated Anti-Money Laundering

The rapid digitalization of financial services has increased both the scale and complexity of transaction data, amplifying the challenges faced by financial institutions in meeting evolving regulatory expectations. While Graph Neural Networks (GNNs) have recently emerged as a natural choice for modeling the relational structure of financial transactions, prior work has largely emphasized architectural sophistication while overlooking two critical dimensions: the methodological pitfalls embedded in financial fraud datasets and the potential role of domain-informed feature engineering (FE). This thesis addresses these gaps by developing a dataset-specific pitfall assessment framework and applying it to two widely used synthetic AML benchmarks—IBM-AML and SAML-D—to evaluate how simulator design and dataset properties may affect the development, methodological soundness and the validity of the results obtained by state-of-the-art (SOTA) pipelines. The insights derived from this simulator and pitfall-aware exploratory data analysis guide the construction of account-centric behavioral features designed to enrich both node and edge attributes in transaction graphs. Building on these enriched representations, the thesis reimplements state-of-the-art GIN-based models for directed multigraphs and integrates the engineered features to assess their impact on AML detection. The empirical analysis extends beyond predictive performance to include computational efficiency, revealing that domain-driven FE consistently improves performance across all architectures, with particularly strong gains for less complex models which reach or surpass the performance of more sophisticated SOTA approaches. Moreover, FE-enhanced models exhibit a more favorable trade-off between detection quality and computational feasibility under different institutional priorities. Overall, the results demonstrate that pitfall-aware feature engineering is an effective complement to GNN-based systems, leading to more reliable pipelines and more competitive models. These findings lay the groundwork for future research combining temporal modeling and hybrid sequential-graph architectures, as well as institution-specific deployments that further bridge the gap between synthetic benchmarks and real-world AML environments.

Feature-Enhanced Graph Neural Networks for Automated Anti-Money Laundering

MAZZOLIN, FRANCESCO

2024/2025

Abstract

The rapid digitalization of financial services has increased both the scale and complexity of transaction data, amplifying the challenges faced by financial institutions in meeting evolving regulatory expectations. While Graph Neural Networks (GNNs) have recently emerged as a natural choice for modeling the relational structure of financial transactions, prior work has largely emphasized architectural sophistication while overlooking two critical dimensions: the methodological pitfalls embedded in financial fraud datasets and the potential role of domain-informed feature engineering (FE). This thesis addresses these gaps by developing a dataset-specific pitfall assessment framework and applying it to two widely used synthetic AML benchmarks—IBM-AML and SAML-D—to evaluate how simulator design and dataset properties may affect the development, methodological soundness and the validity of the results obtained by state-of-the-art (SOTA) pipelines. The insights derived from this simulator and pitfall-aware exploratory data analysis guide the construction of account-centric behavioral features designed to enrich both node and edge attributes in transaction graphs. Building on these enriched representations, the thesis reimplements state-of-the-art GIN-based models for directed multigraphs and integrates the engineered features to assess their impact on AML detection. The empirical analysis extends beyond predictive performance to include computational efficiency, revealing that domain-driven FE consistently improves performance across all architectures, with particularly strong gains for less complex models which reach or surpass the performance of more sophisticated SOTA approaches. Moreover, FE-enhanced models exhibit a more favorable trade-off between detection quality and computational feasibility under different institutional priorities. Overall, the results demonstrate that pitfall-aware feature engineering is an effective complement to GNN-based systems, leading to more reliable pipelines and more competitive models. These findings lay the groundwork for future research combining temporal modeling and hybrid sequential-graph architectures, as well as institution-specific deployments that further bridge the gap between synthetic benchmarks and real-world AML environments.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				COMPUTATIONAL FINANCE  Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Feature-Enhanced Graph Neural Networks for Automated Anti-Money Laundering
			
	Abstract in italiano
	
				The rapid digitalization of financial services has increased both the scale and complexity of transaction data, amplifying the challenges faced by financial institutions in meeting evolving regulatory expectations. While Graph Neural Networks (GNNs) have recently emerged as a natural choice for modeling the relational structure of financial transactions, prior work has largely emphasized architectural sophistication while overlooking two critical dimensions: the methodological pitfalls embedded in financial fraud datasets and the potential role of domain-informed feature engineering (FE). This thesis addresses these gaps by developing a dataset-specific pitfall assessment framework and applying it to two widely used synthetic AML benchmarks—IBM-AML and SAML-D—to evaluate how simulator design and dataset properties may affect the development, methodological soundness and the validity of the results obtained by state-of-the-art (SOTA) pipelines. The insights derived from this simulator and pitfall-aware exploratory data analysis guide the construction of account-centric behavioral features designed to enrich both node and edge attributes in transaction graphs.

Building on these enriched representations, the thesis reimplements state-of-the-art GIN-based models for directed multigraphs and integrates the engineered features to assess their impact on AML detection. The empirical analysis extends beyond predictive performance to include computational efficiency, revealing that domain-driven FE consistently improves performance across all architectures, with particularly strong gains for less complex models which reach or surpass the performance of more sophisticated SOTA approaches. Moreover, FE-enhanced models exhibit a more favorable trade-off between detection quality and computational feasibility under different institutional priorities. Overall, the results demonstrate that pitfall-aware feature engineering is an effective complement to GNN-based systems, leading to more reliable pipelines and more competitive models. These findings lay the groundwork for future research combining temporal modeling and hybrid sequential-graph architectures, as well as institution-specific deployments that further bridge the gap between synthetic benchmarks and real-world AML environments.
			
	Parola chiave
	
				AML
Graph Neural Network
Pitfall Analysis
Fraud Detection
Deep Learning
			
	Relatore
	
				PASA, LUCA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Mazzolin_Francesco.pdf accesso aperto Dimensione 29.67 MB Formato Adobe PDF Visualizza/Apri	29.67 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/101981