Textual Clues of Crime: Comparing Financial Narratives in Mafia-Connected and Legitimate Firms

This thesis investigates whether mafia-infiltrated firms can be systematically distinguished from legitimate firms by analyzing their financial statement disclosures. While prior research has emphasized financial indicators and ownership structures, the textual component of reporting has received limited attention. To address this gap, the study constructs an original dataset of Italian firms implicated in three major anti-mafia operations alongside a control group of comparable firms. The research applies Natural Language Processing (NLP) techniques, specifically the Bag-of-Words (BoW) and Term Frequency–Inverse Document Frequency (TF-IDF) models, combined with logistic regression for classification. The results emphasize that disclosure practices, particularly in relation to debt, differ systematically between mafia-connected and legitimate firms. Moreover, the BoW representation proves more effective than TF-IDF in capturing these patterns and enhancing classification performance. This study provides one of the first systematic applications of NLP to the analysis of mafia-infiltrated firms, demonstrating that even abbreviated financial statements contain meaningful signals of criminal influence. Beyond methodological contributions, the findings underscore the potential of disclosure analysis to complement traditional forensic accounting tools, offering insights for regulators, policymakers, and scholars interested in developing automated early-warning systems against organized crime.

Textual Clues of Crime: Comparing Financial Narratives in Mafia-Connected and Legitimate Firms

CISOLLA, MARCO

2024/2025

Abstract

This thesis investigates whether mafia-infiltrated firms can be systematically distinguished from legitimate firms by analyzing their financial statement disclosures. While prior research has emphasized financial indicators and ownership structures, the textual component of reporting has received limited attention. To address this gap, the study constructs an original dataset of Italian firms implicated in three major anti-mafia operations alongside a control group of comparable firms. The research applies Natural Language Processing (NLP) techniques, specifically the Bag-of-Words (BoW) and Term Frequency–Inverse Document Frequency (TF-IDF) models, combined with logistic regression for classification. The results emphasize that disclosure practices, particularly in relation to debt, differ systematically between mafia-connected and legitimate firms. Moreover, the BoW representation proves more effective than TF-IDF in capturing these patterns and enhancing classification performance. This study provides one of the first systematic applications of NLP to the analysis of mafia-infiltrated firms, demonstrating that even abbreviated financial statements contain meaningful signals of criminal influence. Beyond methodological contributions, the findings underscore the potential of disclosure analysis to complement traditional forensic accounting tools, offering insights for regulators, policymakers, and scholars interested in developing automated early-warning systems against organized crime.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Scienze Economiche e Aziendali "Marco Fanno" - DSEA
			
	Corso di studio
	
				ACCOUNTING, FINANCE AND BUSINESS CONSULTING  Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Textual Clues of Crime: Comparing Financial Narratives in Mafia-Connected and Legitimate Firms
			
	Parola chiave
	
				Mafia
Financial Statements
Text Analysis
Disclosure
			
	Relatore
	
				AMBROSINI, FRANCESCO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Cisolla_Marco.pdf Accesso riservato Dimensione 3.21 MB Formato Adobe PDF	3.21 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/94797