Security advisories are critical for communicating vulnerability information, yet they of- ten exist in unstructured formats across diverse vendors, hindering automated analysis. This thesis presents a framework that parses and classifies advisory content into seven standardized root elements—Metadata, Asset, Vulnerability, Impact & Risk, Mitigation, References, and Reporting & Contact—aligned with NIST and ISO/IEC standards. Ad- visories from multiple sources, including PDF, HTML, and Markdown formats, are pre- processed into plain text and analyzed using Large Language Models (LLMs) guided by a structured dictionary of core concepts. Evaluation on manually annotated datasets demonstrates strong performance, achieving F1-scores of 83% for Atlassian and 79% for Espressif advisories. While some advisories are straightforward and easily classified, oth- ers exhibit complex, inconsistent structures that challenge automated extraction. By converting free-text advisories into structured semantic components, this work enhances reproducibility, interoperability, and efficiency in vulnerability analysis and threat intel- ligence workflows.

Security advisories are critical for communicating vulnerability information, yet they of- ten exist in unstructured formats across diverse vendors, hindering automated analysis. This thesis presents a framework that parses and classifies advisory content into seven standardized root elements—Metadata, Asset, Vulnerability, Impact & Risk, Mitigation, References, and Reporting & Contact—aligned with NIST and ISO/IEC standards. Ad- visories from multiple sources, including PDF, HTML, and Markdown formats, are pre- processed into plain text and analyzed using Large Language Models (LLMs) guided by a structured dictionary of core concepts. Evaluation on manually annotated datasets demonstrates strong performance, achieving F1-scores of 83% for Atlassian and 79% for Espressif advisories. While some advisories are straightforward and easily classified, oth- ers exhibit complex, inconsistent structures that challenge automated extraction. By converting free-text advisories into structured semantic components, this work enhances reproducibility, interoperability, and efficiency in vulnerability analysis and threat intel- ligence workflows.

Automating Security Advisory evaluation through Large Language Models.

KUMAR, VINAYAK
2024/2025

Abstract

Security advisories are critical for communicating vulnerability information, yet they of- ten exist in unstructured formats across diverse vendors, hindering automated analysis. This thesis presents a framework that parses and classifies advisory content into seven standardized root elements—Metadata, Asset, Vulnerability, Impact & Risk, Mitigation, References, and Reporting & Contact—aligned with NIST and ISO/IEC standards. Ad- visories from multiple sources, including PDF, HTML, and Markdown formats, are pre- processed into plain text and analyzed using Large Language Models (LLMs) guided by a structured dictionary of core concepts. Evaluation on manually annotated datasets demonstrates strong performance, achieving F1-scores of 83% for Atlassian and 79% for Espressif advisories. While some advisories are straightforward and easily classified, oth- ers exhibit complex, inconsistent structures that challenge automated extraction. By converting free-text advisories into structured semantic components, this work enhances reproducibility, interoperability, and efficiency in vulnerability analysis and threat intel- ligence workflows.
2024
Automating Security Advisory evaluation through Large Language Models.
Security advisories are critical for communicating vulnerability information, yet they of- ten exist in unstructured formats across diverse vendors, hindering automated analysis. This thesis presents a framework that parses and classifies advisory content into seven standardized root elements—Metadata, Asset, Vulnerability, Impact & Risk, Mitigation, References, and Reporting & Contact—aligned with NIST and ISO/IEC standards. Ad- visories from multiple sources, including PDF, HTML, and Markdown formats, are pre- processed into plain text and analyzed using Large Language Models (LLMs) guided by a structured dictionary of core concepts. Evaluation on manually annotated datasets demonstrates strong performance, achieving F1-scores of 83% for Atlassian and 79% for Espressif advisories. While some advisories are straightforward and easily classified, oth- ers exhibit complex, inconsistent structures that challenge automated extraction. By converting free-text advisories into structured semantic components, this work enhances reproducibility, interoperability, and efficiency in vulnerability analysis and threat intel- ligence workflows.
Large Language Model
Cybersecurity
Security advisory
File in questo prodotto:
File Dimensione Formato  
Vinayak_Kumar_thesis_Final_Version_pdfa_conversion.pdf

accesso aperto

Dimensione 394.64 kB
Formato Adobe PDF
394.64 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/93409