Natural Language Generation (NLG) tools are becoming increasingly important in today’s fast-paced business environment. These tools can save organizations significant amounts of time and resources by automating the process of generating written content from structured data. NLG is widely employed in many fields producing computer-generated medical reports, weather forecasts or newspaper articles, however, little work has been done so far in the cybersecurity field. Nowadays, security analysts have to manually write reports starting from structured data such as STIX (Structured Threat Information eXpression) graphs and network logs, this task is very time-consuming. In this thesis, carried out in collaboration with Leonardo S.p.A., we implement AGIR (Automatic Generation of Intelligence Reports), a NLG tool able to write intelligence reports starting from the JSON representation of STIX graphs. The purpose of AGIR is to assist analysts in the report writing process by providing them significant information and starting them off with a report that is as close as possible to an ideal final version of the report. AGIR produces the final report in a two-stage pipeline. In the first step, it uses a template-based approach to build a baseline text that, in the second phase, is further refined through the use of ChatGPT APIs. The generated reports are then evaluated through the syntactic log-odds ratio (SLOR), a referenceless model-dependent metric for fluency evaluation, and a questionnaire-based human evaluation on three dimensions: correctness, fluency and utility. The generated reports overall reach good scores on all three levels, but there is room for improvement in the implementation of both steps. The first step introduces maintainability issues that can be circumvented by using a neural-based approach for the creation of the draft text. The second step can be improved by using a free and local deep learning model.

Natural Language Generation (NLG) tools are becoming increasingly important in today’s fast-paced business environment. These tools can save organizations significant amounts of time and resources by automating the process of generating written content from structured data. NLG is widely employed in many fields producing computer-generated medical reports, weather forecasts or newspaper articles, however, little work has been done so far in the cybersecurity field. Nowadays, security analysts have to manually write reports starting from structured data such as STIX (Structured Threat Information eXpression) graphs and network logs, this task is very time-consuming. In this thesis, carried out in collaboration with Leonardo S.p.A., we implement AGIR (Automatic Generation of Intelligence Reports), a NLG tool able to write intelligence reports starting from the JSON representation of STIX graphs. The purpose of AGIR is to assist analysts in the report writing process by providing them significant information and starting them off with a report that is as close as possible to an ideal final version of the report. AGIR produces the final report in a two-stage pipeline. In the first step, it uses a template-based approach to build a baseline text that, in the second phase, is further refined through the use of ChatGPT APIs. The generated reports are then evaluated through the syntactic log-odds ratio (SLOR), a referenceless model-dependent metric for fluency evaluation, and a questionnaire-based human evaluation on three dimensions: correctness, fluency and utility. The generated reports overall reach good scores on all three levels, but there is room for improvement in the implementation of both steps. The first step introduces maintainability issues that can be circumvented by using a neural-based approach for the creation of the draft text. The second step can be improved by using a free and local deep learning model.

AGIR: Automatic Generation of Intelligence Reports

PERRINA, FILIPPO
2022/2023

Abstract

Natural Language Generation (NLG) tools are becoming increasingly important in today’s fast-paced business environment. These tools can save organizations significant amounts of time and resources by automating the process of generating written content from structured data. NLG is widely employed in many fields producing computer-generated medical reports, weather forecasts or newspaper articles, however, little work has been done so far in the cybersecurity field. Nowadays, security analysts have to manually write reports starting from structured data such as STIX (Structured Threat Information eXpression) graphs and network logs, this task is very time-consuming. In this thesis, carried out in collaboration with Leonardo S.p.A., we implement AGIR (Automatic Generation of Intelligence Reports), a NLG tool able to write intelligence reports starting from the JSON representation of STIX graphs. The purpose of AGIR is to assist analysts in the report writing process by providing them significant information and starting them off with a report that is as close as possible to an ideal final version of the report. AGIR produces the final report in a two-stage pipeline. In the first step, it uses a template-based approach to build a baseline text that, in the second phase, is further refined through the use of ChatGPT APIs. The generated reports are then evaluated through the syntactic log-odds ratio (SLOR), a referenceless model-dependent metric for fluency evaluation, and a questionnaire-based human evaluation on three dimensions: correctness, fluency and utility. The generated reports overall reach good scores on all three levels, but there is room for improvement in the implementation of both steps. The first step introduces maintainability issues that can be circumvented by using a neural-based approach for the creation of the draft text. The second step can be improved by using a free and local deep learning model.
2022
AGIR: Automatic Generation of Intelligence Reports
Natural Language Generation (NLG) tools are becoming increasingly important in today’s fast-paced business environment. These tools can save organizations significant amounts of time and resources by automating the process of generating written content from structured data. NLG is widely employed in many fields producing computer-generated medical reports, weather forecasts or newspaper articles, however, little work has been done so far in the cybersecurity field. Nowadays, security analysts have to manually write reports starting from structured data such as STIX (Structured Threat Information eXpression) graphs and network logs, this task is very time-consuming. In this thesis, carried out in collaboration with Leonardo S.p.A., we implement AGIR (Automatic Generation of Intelligence Reports), a NLG tool able to write intelligence reports starting from the JSON representation of STIX graphs. The purpose of AGIR is to assist analysts in the report writing process by providing them significant information and starting them off with a report that is as close as possible to an ideal final version of the report. AGIR produces the final report in a two-stage pipeline. In the first step, it uses a template-based approach to build a baseline text that, in the second phase, is further refined through the use of ChatGPT APIs. The generated reports are then evaluated through the syntactic log-odds ratio (SLOR), a referenceless model-dependent metric for fluency evaluation, and a questionnaire-based human evaluation on three dimensions: correctness, fluency and utility. The generated reports overall reach good scores on all three levels, but there is room for improvement in the implementation of both steps. The first step introduces maintainability issues that can be circumvented by using a neural-based approach for the creation of the draft text. The second step can be improved by using a free and local deep learning model.
Threat Intelligence
Language Generation
Cybersecurity
File in questo prodotto:
File Dimensione Formato  
FilippoPerrinaTesi.pdf

accesso aperto

Dimensione 952.47 kB
Formato Adobe PDF
952.47 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/50202