In recent years, Artificial Intelligence (AI) has rapidly evolved from being integrated only in smart assistants devices (e.g. Siri or Alexa) to becoming widespread everywhere (e.g. search engines and smarthphones). This diffusion is due to the advent of Large Language Models (LLMs), which have also started to be studied in scientific fields such as Computer Engineering and Cybersecurity. This thesis explores the use of LLMs connected to Penetration Testing (PT), a crucial activity for evaluating the security of a computer system or a network to improve the overall security. In particular, it examines PentestGPT, an open-source tool created incorporating LLMs, which was designed to help penetration testers during their working activities, providing guidance to the user with suggestions, results explanations and keeping a history of what has been done. The LLMs can be used through Application Programming Interface (API) calls, which allow to run remote and local models. Motivated by the goal of understanding limitations and strengths of LLMs in PT, two different studies were performed. The first evaluated the feasibility of using small local models as replacement of remote models, considering that their size is much smaller (~10B vs 1.5T). The second, instead, analyzed the performances of remote models against Virtual Machines (VMs) coming from Hack The Box (HTB), a learning platform for pentesters. The results of the first experiment showed that small local models can be integrated into PentestGPT, but they introduced several limitations (e.g. lack of coherence and safety alignment) that required specific improvements. In contrast, the second analysis with remote models demonstrated that, while LLMs can provide meaningful guidance, they are not fully effective, since human intervention is still necessary to achieve successful outcomes.
In recent years, Artificial Intelligence (AI) has rapidly evolved from being integrated only in smart assistants devices (e.g. Siri or Alexa) to becoming widespread everywhere (e.g. search engines and smarthphones). This diffusion is due to the advent of Large Language Models (LLMs), which have also started to be studied in scientific fields such as Computer Engineering and Cybersecurity. This thesis explores the use of LLMs connected to Penetration Testing (PT), a crucial activity for evaluating the security of a computer system or a network to improve the overall security. In particular, it examines PentestGPT, an open-source tool created incorporating LLMs, which was designed to help penetration testers during their working activities, providing guidance to the user with suggestions, results explanations and keeping a history of what has been done. The LLMs can be used through Application Programming Interface (API) calls, which allow to run remote and local models. Motivated by the goal of understanding limitations and strengths of LLMs in PT, two different studies were performed. The first evaluated the feasibility of using small local models as replacement of remote models, considering that their size is much smaller (~10B vs 1.5T). The second, instead, analyzed the performances of remote models against Virtual Machines (VMs) coming from Hack The Box (HTB), a learning platform for pentesters. The results of the first experiment showed that small local models can be integrated into PentestGPT, but they introduced several limitations (e.g. lack of coherence and safety alignment) that required specific improvements. In contrast, the second analysis with remote models demonstrated that, while LLMs can provide meaningful guidance, they are not fully effective, since human intervention is still necessary to achieve successful outcomes.
Penetration Testing and Large Language Models: A Study on LLM-based PT with Local and Remote Models
MIELE, RICCARDO
2024/2025
Abstract
In recent years, Artificial Intelligence (AI) has rapidly evolved from being integrated only in smart assistants devices (e.g. Siri or Alexa) to becoming widespread everywhere (e.g. search engines and smarthphones). This diffusion is due to the advent of Large Language Models (LLMs), which have also started to be studied in scientific fields such as Computer Engineering and Cybersecurity. This thesis explores the use of LLMs connected to Penetration Testing (PT), a crucial activity for evaluating the security of a computer system or a network to improve the overall security. In particular, it examines PentestGPT, an open-source tool created incorporating LLMs, which was designed to help penetration testers during their working activities, providing guidance to the user with suggestions, results explanations and keeping a history of what has been done. The LLMs can be used through Application Programming Interface (API) calls, which allow to run remote and local models. Motivated by the goal of understanding limitations and strengths of LLMs in PT, two different studies were performed. The first evaluated the feasibility of using small local models as replacement of remote models, considering that their size is much smaller (~10B vs 1.5T). The second, instead, analyzed the performances of remote models against Virtual Machines (VMs) coming from Hack The Box (HTB), a learning platform for pentesters. The results of the first experiment showed that small local models can be integrated into PentestGPT, but they introduced several limitations (e.g. lack of coherence and safety alignment) that required specific improvements. In contrast, the second analysis with remote models demonstrated that, while LLMs can provide meaningful guidance, they are not fully effective, since human intervention is still necessary to achieve successful outcomes.| File | Dimensione | Formato | |
|---|---|---|---|
|
Miele_Riccardo.pdf
Accesso riservato
Dimensione
1.3 MB
Formato
Adobe PDF
|
1.3 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/91817