In recent years, Artificial Intelligence (AI) has rapidly evolved from being integrated only in smart assistants devices (e.g. Siri or Alexa) to becoming widespread everywhere (e.g. search engines and smarthphones). This diffusion is due to the advent of Large Language Models (LLMs), which have also started to be studied in scientific fields such as Computer Engineering and Cybersecurity. This thesis explores the use of LLMs connected to Penetration Testing (PT), a crucial activity for evaluating the security of a computer system or a network to improve the overall security. In particular, it examines PentestGPT, an open-source tool created incorporating LLMs, which was designed to help penetration testers during their working activities, providing guidance to the user with suggestions, results explanations and keeping a history of what has been done. The LLMs can be used through Application Programming Interface (API) calls, which allow to run remote and local models. Motivated by the goal of understanding limitations and strengths of LLMs in PT, two different studies were performed. The first evaluated the feasibility of using small local models as replacement of remote models, considering that their size is much smaller (~10B vs 1.5T). The second, instead, analyzed the performances of remote models against Virtual Machines (VMs) coming from Hack The Box (HTB), a learning platform for pentesters. The results of the first experiment showed that small local models can be integrated into PentestGPT, but they introduced several limitations (e.g. lack of coherence and safety alignment) that required specific improvements. In contrast, the second analysis with remote models demonstrated that, while LLMs can provide meaningful guidance, they are not fully effective, since human intervention is still necessary to achieve successful outcomes.

In recent years, Artificial Intelligence (AI) has rapidly evolved from being integrated only in smart assistants devices (e.g. Siri or Alexa) to becoming widespread everywhere (e.g. search engines and smarthphones). This diffusion is due to the advent of Large Language Models (LLMs), which have also started to be studied in scientific fields such as Computer Engineering and Cybersecurity. This thesis explores the use of LLMs connected to Penetration Testing (PT), a crucial activity for evaluating the security of a computer system or a network to improve the overall security. In particular, it examines PentestGPT, an open-source tool created incorporating LLMs, which was designed to help penetration testers during their working activities, providing guidance to the user with suggestions, results explanations and keeping a history of what has been done. The LLMs can be used through Application Programming Interface (API) calls, which allow to run remote and local models. Motivated by the goal of understanding limitations and strengths of LLMs in PT, two different studies were performed. The first evaluated the feasibility of using small local models as replacement of remote models, considering that their size is much smaller (~10B vs 1.5T). The second, instead, analyzed the performances of remote models against Virtual Machines (VMs) coming from Hack The Box (HTB), a learning platform for pentesters. The results of the first experiment showed that small local models can be integrated into PentestGPT, but they introduced several limitations (e.g. lack of coherence and safety alignment) that required specific improvements. In contrast, the second analysis with remote models demonstrated that, while LLMs can provide meaningful guidance, they are not fully effective, since human intervention is still necessary to achieve successful outcomes.

Penetration Testing and Large Language Models: A Study on LLM-based PT with Local and Remote Models

MIELE, RICCARDO
2024/2025

Abstract

In recent years, Artificial Intelligence (AI) has rapidly evolved from being integrated only in smart assistants devices (e.g. Siri or Alexa) to becoming widespread everywhere (e.g. search engines and smarthphones). This diffusion is due to the advent of Large Language Models (LLMs), which have also started to be studied in scientific fields such as Computer Engineering and Cybersecurity. This thesis explores the use of LLMs connected to Penetration Testing (PT), a crucial activity for evaluating the security of a computer system or a network to improve the overall security. In particular, it examines PentestGPT, an open-source tool created incorporating LLMs, which was designed to help penetration testers during their working activities, providing guidance to the user with suggestions, results explanations and keeping a history of what has been done. The LLMs can be used through Application Programming Interface (API) calls, which allow to run remote and local models. Motivated by the goal of understanding limitations and strengths of LLMs in PT, two different studies were performed. The first evaluated the feasibility of using small local models as replacement of remote models, considering that their size is much smaller (~10B vs 1.5T). The second, instead, analyzed the performances of remote models against Virtual Machines (VMs) coming from Hack The Box (HTB), a learning platform for pentesters. The results of the first experiment showed that small local models can be integrated into PentestGPT, but they introduced several limitations (e.g. lack of coherence and safety alignment) that required specific improvements. In contrast, the second analysis with remote models demonstrated that, while LLMs can provide meaningful guidance, they are not fully effective, since human intervention is still necessary to achieve successful outcomes.
2024
Penetration Testing and Large Language Models: A Study on LLM-based PT with Local and Remote Models
In recent years, Artificial Intelligence (AI) has rapidly evolved from being integrated only in smart assistants devices (e.g. Siri or Alexa) to becoming widespread everywhere (e.g. search engines and smarthphones). This diffusion is due to the advent of Large Language Models (LLMs), which have also started to be studied in scientific fields such as Computer Engineering and Cybersecurity. This thesis explores the use of LLMs connected to Penetration Testing (PT), a crucial activity for evaluating the security of a computer system or a network to improve the overall security. In particular, it examines PentestGPT, an open-source tool created incorporating LLMs, which was designed to help penetration testers during their working activities, providing guidance to the user with suggestions, results explanations and keeping a history of what has been done. The LLMs can be used through Application Programming Interface (API) calls, which allow to run remote and local models. Motivated by the goal of understanding limitations and strengths of LLMs in PT, two different studies were performed. The first evaluated the feasibility of using small local models as replacement of remote models, considering that their size is much smaller (~10B vs 1.5T). The second, instead, analyzed the performances of remote models against Virtual Machines (VMs) coming from Hack The Box (HTB), a learning platform for pentesters. The results of the first experiment showed that small local models can be integrated into PentestGPT, but they introduced several limitations (e.g. lack of coherence and safety alignment) that required specific improvements. In contrast, the second analysis with remote models demonstrated that, while LLMs can provide meaningful guidance, they are not fully effective, since human intervention is still necessary to achieve successful outcomes.
LLM
Penetration Testing
PentestGPT
File in questo prodotto:
File Dimensione Formato  
Miele_Riccardo.pdf

Accesso riservato

Dimensione 1.3 MB
Formato Adobe PDF
1.3 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91817