Automating the Maturity Model for Audit and Compliance: An NLP and Retrieval-Augmented Generation (RAG) Approach

Over the past decade, the security of software development and supply chain has become a critical concern for producers and consumers. The continuous evolution from monolithic design to distributed architectures has introduced for sure advantages, nonetheless attack surfaces have increased. Consequently, audit processes have become central to evaluate and prevent security controls and escalations, verify compliance with intern policies and regulations, and catch misconfigurations in code development. Modern development's tight production times have demanded increasingly efficient and near real-time audit processes. To meet his challenge, we selected the MM4DSO maturity model, developed by the University of Padua. Maturity models are conceptual frameworks designed to conduct audits and compliance checks in client companies. In particular, they measure an organization’s current state regarding a given discipline or process area and to provide a roadmap for reaching higher capability. In this specific case, experts must manually complete over 300 questions across six domains, which makes the process laborious and error-prone. This thesis then, focuses on developing an AI engine that automates the compilation of the MM4DSO maturity model. Our solution involves finding the best and most optimal method to pre-compile the model’s target levels using the information that clients provide. In this way, we support the client interview with the results we obtain, eliminating the need for extensive human supervision throughout the process. In this work, we analyze several Natural Language Processing (NLP) approaches to identify the most effective strategy for extracting and interpreting data from corporate policy documents, including summarization, fine-tuning of large language models, and Retrieval Augmented Generation (RAG). By leveraging the RAG technique, our engine processes policy files as input, retrieves context relevant to the question, and generates a structured output file containing compiled responses with the relative grade of “maturity”. This approach optimizes the audit procedure and ensures greater accuracy and standardization. In our work we obtained a compilation time reduced by 87.5% compared with manual interviews, with almost zero-costs resources; a general purpose and modular RAG pipeline, which can be adapted also to other frameworks and regulations, and a precision metric of 82%, recall of 86.2% for the aforementioned pipeline.

Automating the Maturity Model for Audit and Compliance: An NLP and Retrieval-Augmented Generation (RAG) Approach

PASCA, MARCO

2024/2025

Abstract

Over the past decade, the security of software development and supply chain has become a critical concern for producers and consumers. The continuous evolution from monolithic design to distributed architectures has introduced for sure advantages, nonetheless attack surfaces have increased. Consequently, audit processes have become central to evaluate and prevent security controls and escalations, verify compliance with intern policies and regulations, and catch misconfigurations in code development. Modern development's tight production times have demanded increasingly efficient and near real-time audit processes. To meet his challenge, we selected the MM4DSO maturity model, developed by the University of Padua. Maturity models are conceptual frameworks designed to conduct audits and compliance checks in client companies. In particular, they measure an organization’s current state regarding a given discipline or process area and to provide a roadmap for reaching higher capability. In this specific case, experts must manually complete over 300 questions across six domains, which makes the process laborious and error-prone. This thesis then, focuses on developing an AI engine that automates the compilation of the MM4DSO maturity model. Our solution involves finding the best and most optimal method to pre-compile the model’s target levels using the information that clients provide. In this way, we support the client interview with the results we obtain, eliminating the need for extensive human supervision throughout the process. In this work, we analyze several Natural Language Processing (NLP) approaches to identify the most effective strategy for extracting and interpreting data from corporate policy documents, including summarization, fine-tuning of large language models, and Retrieval Augmented Generation (RAG). By leveraging the RAG technique, our engine processes policy files as input, retrieves context relevant to the question, and generates a structured output file containing compiled responses with the relative grade of “maturity”. This approach optimizes the audit procedure and ensures greater accuracy and standardization. In our work we obtained a compilation time reduced by 87.5% compared with manual interviews, with almost zero-costs resources; a general purpose and modular RAG pipeline, which can be adapted also to other frameworks and regulations, and a precision metric of 82%, recall of 86.2% for the aforementioned pipeline.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				CYBERSECURITY Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Automating the Maturity Model for Audit and Compliance: An NLP and Retrieval-Augmented Generation (RAG) Approach
			
	Abstract in italiano
	
				Over the past decade, the security of software development and supply chain has become a critical concern for producers and consumers. The continuous evolution from monolithic design to distributed architectures has introduced for sure advantages, nonetheless attack surfaces have increased. Consequently, audit processes have become central to evaluate and prevent security controls and escalations, verify compliance with intern policies and regulations, and catch misconfigurations in code development.
Modern development's tight production times have demanded increasingly efficient and near real-time audit processes.
To meet his challenge, we selected the MM4DSO maturity model, developed by the University of Padua.
 Maturity models are conceptual frameworks designed to conduct audits and compliance checks in client companies. In particular, they measure an organization’s current state regarding a given discipline or process area and to provide a roadmap for reaching higher capability. In this specific case, experts must manually complete over 300 questions across six domains, which makes the process laborious and error-prone. This thesis then, focuses on developing an AI engine that automates the compilation of the MM4DSO maturity model. Our solution involves finding the best and most optimal method to pre-compile the model’s target levels using the information that clients provide. In this way, we support the client interview with the results we obtain, eliminating the need for extensive human supervision throughout the process. In this work, we analyze several Natural Language Processing (NLP) approaches to identify the most effective strategy for extracting and interpreting data from corporate policy documents, including summarization, fine-tuning of large language models, and Retrieval Augmented Generation (RAG). By leveraging the RAG technique, our engine processes policy files as input, retrieves context relevant to the question, and generates a structured output file containing compiled responses with the relative grade of “maturity”. This approach optimizes the audit procedure and ensures greater accuracy and standardization. In our work we obtained a compilation time reduced by 87.5% compared with manual interviews, with almost zero-costs resources; a general purpose and modular RAG pipeline, which can be adapted also to other frameworks and regulations, and a precision metric of 82%, recall of 86.2% for the aforementioned pipeline.
			
	Parola chiave
	
				Maturity model
NLP
RAG
			
	Relatore
	
				BRIGHENTE, ALESSANDRO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Marco_Pasca_Thesis.pdf Accesso riservato Dimensione 677.18 kB Formato Adobe PDF	677.18 kB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89889