The rapid digitalization of financial services has increased both the scale and complexity of transaction data, amplifying the challenges faced by financial institutions in meeting evolving regulatory expectations. While Graph Neural Networks (GNNs) have recently emerged as a natural choice for modeling the relational structure of financial transactions, prior work has largely emphasized architectural sophistication while overlooking two critical dimensions: the methodological pitfalls embedded in financial fraud datasets and the potential role of domain-informed feature engineering (FE). This thesis addresses these gaps by developing a dataset-specific pitfall assessment framework and applying it to two widely used synthetic AML benchmarks—IBM-AML and SAML-D—to evaluate how simulator design and dataset properties may affect the development, methodological soundness and the validity of the results obtained by state-of-the-art (SOTA) pipelines. The insights derived from this simulator and pitfall-aware exploratory data analysis guide the construction of account-centric behavioral features designed to enrich both node and edge attributes in transaction graphs. Building on these enriched representations, the thesis reimplements state-of-the-art GIN-based models for directed multigraphs and integrates the engineered features to assess their impact on AML detection. The empirical analysis extends beyond predictive performance to include computational efficiency, revealing that domain-driven FE consistently improves performance across all architectures, with particularly strong gains for less complex models which reach or surpass the performance of more sophisticated SOTA approaches. Moreover, FE-enhanced models exhibit a more favorable trade-off between detection quality and computational feasibility under different institutional priorities. Overall, the results demonstrate that pitfall-aware feature engineering is an effective complement to GNN-based systems, leading to more reliable pipelines and more competitive models. These findings lay the groundwork for future research combining temporal modeling and hybrid sequential-graph architectures, as well as institution-specific deployments that further bridge the gap between synthetic benchmarks and real-world AML environments.

The rapid digitalization of financial services has increased both the scale and complexity of transaction data, amplifying the challenges faced by financial institutions in meeting evolving regulatory expectations. While Graph Neural Networks (GNNs) have recently emerged as a natural choice for modeling the relational structure of financial transactions, prior work has largely emphasized architectural sophistication while overlooking two critical dimensions: the methodological pitfalls embedded in financial fraud datasets and the potential role of domain-informed feature engineering (FE). This thesis addresses these gaps by developing a dataset-specific pitfall assessment framework and applying it to two widely used synthetic AML benchmarks—IBM-AML and SAML-D—to evaluate how simulator design and dataset properties may affect the development, methodological soundness and the validity of the results obtained by state-of-the-art (SOTA) pipelines. The insights derived from this simulator and pitfall-aware exploratory data analysis guide the construction of account-centric behavioral features designed to enrich both node and edge attributes in transaction graphs. Building on these enriched representations, the thesis reimplements state-of-the-art GIN-based models for directed multigraphs and integrates the engineered features to assess their impact on AML detection. The empirical analysis extends beyond predictive performance to include computational efficiency, revealing that domain-driven FE consistently improves performance across all architectures, with particularly strong gains for less complex models which reach or surpass the performance of more sophisticated SOTA approaches. Moreover, FE-enhanced models exhibit a more favorable trade-off between detection quality and computational feasibility under different institutional priorities. Overall, the results demonstrate that pitfall-aware feature engineering is an effective complement to GNN-based systems, leading to more reliable pipelines and more competitive models. These findings lay the groundwork for future research combining temporal modeling and hybrid sequential-graph architectures, as well as institution-specific deployments that further bridge the gap between synthetic benchmarks and real-world AML environments.

Feature-Enhanced Graph Neural Networks for Automated Anti-Money Laundering

MAZZOLIN, FRANCESCO
2024/2025

Abstract

The rapid digitalization of financial services has increased both the scale and complexity of transaction data, amplifying the challenges faced by financial institutions in meeting evolving regulatory expectations. While Graph Neural Networks (GNNs) have recently emerged as a natural choice for modeling the relational structure of financial transactions, prior work has largely emphasized architectural sophistication while overlooking two critical dimensions: the methodological pitfalls embedded in financial fraud datasets and the potential role of domain-informed feature engineering (FE). This thesis addresses these gaps by developing a dataset-specific pitfall assessment framework and applying it to two widely used synthetic AML benchmarks—IBM-AML and SAML-D—to evaluate how simulator design and dataset properties may affect the development, methodological soundness and the validity of the results obtained by state-of-the-art (SOTA) pipelines. The insights derived from this simulator and pitfall-aware exploratory data analysis guide the construction of account-centric behavioral features designed to enrich both node and edge attributes in transaction graphs. Building on these enriched representations, the thesis reimplements state-of-the-art GIN-based models for directed multigraphs and integrates the engineered features to assess their impact on AML detection. The empirical analysis extends beyond predictive performance to include computational efficiency, revealing that domain-driven FE consistently improves performance across all architectures, with particularly strong gains for less complex models which reach or surpass the performance of more sophisticated SOTA approaches. Moreover, FE-enhanced models exhibit a more favorable trade-off between detection quality and computational feasibility under different institutional priorities. Overall, the results demonstrate that pitfall-aware feature engineering is an effective complement to GNN-based systems, leading to more reliable pipelines and more competitive models. These findings lay the groundwork for future research combining temporal modeling and hybrid sequential-graph architectures, as well as institution-specific deployments that further bridge the gap between synthetic benchmarks and real-world AML environments.
2024
Feature-Enhanced Graph Neural Networks for Automated Anti-Money Laundering
The rapid digitalization of financial services has increased both the scale and complexity of transaction data, amplifying the challenges faced by financial institutions in meeting evolving regulatory expectations. While Graph Neural Networks (GNNs) have recently emerged as a natural choice for modeling the relational structure of financial transactions, prior work has largely emphasized architectural sophistication while overlooking two critical dimensions: the methodological pitfalls embedded in financial fraud datasets and the potential role of domain-informed feature engineering (FE). This thesis addresses these gaps by developing a dataset-specific pitfall assessment framework and applying it to two widely used synthetic AML benchmarks—IBM-AML and SAML-D—to evaluate how simulator design and dataset properties may affect the development, methodological soundness and the validity of the results obtained by state-of-the-art (SOTA) pipelines. The insights derived from this simulator and pitfall-aware exploratory data analysis guide the construction of account-centric behavioral features designed to enrich both node and edge attributes in transaction graphs. Building on these enriched representations, the thesis reimplements state-of-the-art GIN-based models for directed multigraphs and integrates the engineered features to assess their impact on AML detection. The empirical analysis extends beyond predictive performance to include computational efficiency, revealing that domain-driven FE consistently improves performance across all architectures, with particularly strong gains for less complex models which reach or surpass the performance of more sophisticated SOTA approaches. Moreover, FE-enhanced models exhibit a more favorable trade-off between detection quality and computational feasibility under different institutional priorities. Overall, the results demonstrate that pitfall-aware feature engineering is an effective complement to GNN-based systems, leading to more reliable pipelines and more competitive models. These findings lay the groundwork for future research combining temporal modeling and hybrid sequential-graph architectures, as well as institution-specific deployments that further bridge the gap between synthetic benchmarks and real-world AML environments.
AML
Graph Neural Network
Pitfall Analysis
Fraud Detection
Deep Learning
File in questo prodotto:
File Dimensione Formato  
Mazzolin_Francesco.pdf

accesso aperto

Dimensione 29.67 MB
Formato Adobe PDF
29.67 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/101981