Over the past decade, the security of software development and supply chain has become a critical concern for producers and consumers. The continuous evolution from monolithic design to distributed architectures has introduced for sure advantages, nonetheless attack surfaces have increased. Consequently, audit processes have become central to evaluate and prevent security controls and escalations, verify compliance with intern policies and regulations, and catch misconfigurations in code development. Modern development's tight production times have demanded increasingly efficient and near real-time audit processes. To meet his challenge, we selected the MM4DSO maturity model, developed by the University of Padua. Maturity models are conceptual frameworks designed to conduct audits and compliance checks in client companies. In particular, they measure an organization’s current state regarding a given discipline or process area and to provide a roadmap for reaching higher capability. In this specific case, experts must manually complete over 300 questions across six domains, which makes the process laborious and error-prone. This thesis then, focuses on developing an AI engine that automates the compilation of the MM4DSO maturity model. Our solution involves finding the best and most optimal method to pre-compile the model’s target levels using the information that clients provide. In this way, we support the client interview with the results we obtain, eliminating the need for extensive human supervision throughout the process. In this work, we analyze several Natural Language Processing (NLP) approaches to identify the most effective strategy for extracting and interpreting data from corporate policy documents, including summarization, fine-tuning of large language models, and Retrieval Augmented Generation (RAG). By leveraging the RAG technique, our engine processes policy files as input, retrieves context relevant to the question, and generates a structured output file containing compiled responses with the relative grade of “maturity”. This approach optimizes the audit procedure and ensures greater accuracy and standardization. In our work we obtained a compilation time reduced by 87.5% compared with manual interviews, with almost zero-costs resources; a general purpose and modular RAG pipeline, which can be adapted also to other frameworks and regulations, and a precision metric of 82%, recall of 86.2% for the aforementioned pipeline.
Over the past decade, the security of software development and supply chain has become a critical concern for producers and consumers. The continuous evolution from monolithic design to distributed architectures has introduced for sure advantages, nonetheless attack surfaces have increased. Consequently, audit processes have become central to evaluate and prevent security controls and escalations, verify compliance with intern policies and regulations, and catch misconfigurations in code development. Modern development's tight production times have demanded increasingly efficient and near real-time audit processes. To meet his challenge, we selected the MM4DSO maturity model, developed by the University of Padua. Maturity models are conceptual frameworks designed to conduct audits and compliance checks in client companies. In particular, they measure an organization’s current state regarding a given discipline or process area and to provide a roadmap for reaching higher capability. In this specific case, experts must manually complete over 300 questions across six domains, which makes the process laborious and error-prone. This thesis then, focuses on developing an AI engine that automates the compilation of the MM4DSO maturity model. Our solution involves finding the best and most optimal method to pre-compile the model’s target levels using the information that clients provide. In this way, we support the client interview with the results we obtain, eliminating the need for extensive human supervision throughout the process. In this work, we analyze several Natural Language Processing (NLP) approaches to identify the most effective strategy for extracting and interpreting data from corporate policy documents, including summarization, fine-tuning of large language models, and Retrieval Augmented Generation (RAG). By leveraging the RAG technique, our engine processes policy files as input, retrieves context relevant to the question, and generates a structured output file containing compiled responses with the relative grade of “maturity”. This approach optimizes the audit procedure and ensures greater accuracy and standardization. In our work we obtained a compilation time reduced by 87.5% compared with manual interviews, with almost zero-costs resources; a general purpose and modular RAG pipeline, which can be adapted also to other frameworks and regulations, and a precision metric of 82%, recall of 86.2% for the aforementioned pipeline.
Automating the Maturity Model for Audit and Compliance: An NLP and Retrieval-Augmented Generation (RAG) Approach
PASCA, MARCO
2024/2025
Abstract
Over the past decade, the security of software development and supply chain has become a critical concern for producers and consumers. The continuous evolution from monolithic design to distributed architectures has introduced for sure advantages, nonetheless attack surfaces have increased. Consequently, audit processes have become central to evaluate and prevent security controls and escalations, verify compliance with intern policies and regulations, and catch misconfigurations in code development. Modern development's tight production times have demanded increasingly efficient and near real-time audit processes. To meet his challenge, we selected the MM4DSO maturity model, developed by the University of Padua. Maturity models are conceptual frameworks designed to conduct audits and compliance checks in client companies. In particular, they measure an organization’s current state regarding a given discipline or process area and to provide a roadmap for reaching higher capability. In this specific case, experts must manually complete over 300 questions across six domains, which makes the process laborious and error-prone. This thesis then, focuses on developing an AI engine that automates the compilation of the MM4DSO maturity model. Our solution involves finding the best and most optimal method to pre-compile the model’s target levels using the information that clients provide. In this way, we support the client interview with the results we obtain, eliminating the need for extensive human supervision throughout the process. In this work, we analyze several Natural Language Processing (NLP) approaches to identify the most effective strategy for extracting and interpreting data from corporate policy documents, including summarization, fine-tuning of large language models, and Retrieval Augmented Generation (RAG). By leveraging the RAG technique, our engine processes policy files as input, retrieves context relevant to the question, and generates a structured output file containing compiled responses with the relative grade of “maturity”. This approach optimizes the audit procedure and ensures greater accuracy and standardization. In our work we obtained a compilation time reduced by 87.5% compared with manual interviews, with almost zero-costs resources; a general purpose and modular RAG pipeline, which can be adapted also to other frameworks and regulations, and a precision metric of 82%, recall of 86.2% for the aforementioned pipeline.| File | Dimensione | Formato | |
|---|---|---|---|
|
Marco_Pasca_Thesis.pdf
Accesso riservato
Dimensione
677.18 kB
Formato
Adobe PDF
|
677.18 kB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/89889