Security advisories are critical for communicating vulnerability information, yet they of- ten exist in unstructured formats across diverse vendors, hindering automated analysis. This thesis presents a framework that parses and classifies advisory content into seven standardized root elements—Metadata, Asset, Vulnerability, Impact & Risk, Mitigation, References, and Reporting & Contact—aligned with NIST and ISO/IEC standards. Ad- visories from multiple sources, including PDF, HTML, and Markdown formats, are pre- processed into plain text and analyzed using Large Language Models (LLMs) guided by a structured dictionary of core concepts. Evaluation on manually annotated datasets demonstrates strong performance, achieving F1-scores of 83% for Atlassian and 79% for Espressif advisories. While some advisories are straightforward and easily classified, oth- ers exhibit complex, inconsistent structures that challenge automated extraction. By converting free-text advisories into structured semantic components, this work enhances reproducibility, interoperability, and efficiency in vulnerability analysis and threat intel- ligence workflows.
Security advisories are critical for communicating vulnerability information, yet they of- ten exist in unstructured formats across diverse vendors, hindering automated analysis. This thesis presents a framework that parses and classifies advisory content into seven standardized root elements—Metadata, Asset, Vulnerability, Impact & Risk, Mitigation, References, and Reporting & Contact—aligned with NIST and ISO/IEC standards. Ad- visories from multiple sources, including PDF, HTML, and Markdown formats, are pre- processed into plain text and analyzed using Large Language Models (LLMs) guided by a structured dictionary of core concepts. Evaluation on manually annotated datasets demonstrates strong performance, achieving F1-scores of 83% for Atlassian and 79% for Espressif advisories. While some advisories are straightforward and easily classified, oth- ers exhibit complex, inconsistent structures that challenge automated extraction. By converting free-text advisories into structured semantic components, this work enhances reproducibility, interoperability, and efficiency in vulnerability analysis and threat intel- ligence workflows.
Automating Security Advisory evaluation through Large Language Models.
KUMAR, VINAYAK
2024/2025
Abstract
Security advisories are critical for communicating vulnerability information, yet they of- ten exist in unstructured formats across diverse vendors, hindering automated analysis. This thesis presents a framework that parses and classifies advisory content into seven standardized root elements—Metadata, Asset, Vulnerability, Impact & Risk, Mitigation, References, and Reporting & Contact—aligned with NIST and ISO/IEC standards. Ad- visories from multiple sources, including PDF, HTML, and Markdown formats, are pre- processed into plain text and analyzed using Large Language Models (LLMs) guided by a structured dictionary of core concepts. Evaluation on manually annotated datasets demonstrates strong performance, achieving F1-scores of 83% for Atlassian and 79% for Espressif advisories. While some advisories are straightforward and easily classified, oth- ers exhibit complex, inconsistent structures that challenge automated extraction. By converting free-text advisories into structured semantic components, this work enhances reproducibility, interoperability, and efficiency in vulnerability analysis and threat intel- ligence workflows.| File | Dimensione | Formato | |
|---|---|---|---|
|
Vinayak_Kumar_thesis_Final_Version_pdfa_conversion.pdf
accesso aperto
Dimensione
394.64 kB
Formato
Adobe PDF
|
394.64 kB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/93409