The ever-growing landscape of Android applications necessitates robust security mechanisms to mitigate potential vulnerabilities. This thesis presents a comprehensive evaluation of three Language Model (LLM)-based tools—ChatGPT, Google Bard, and Android Studio Bot—for automated Android security vulnerability repair. To ensure the authenticity of the evaluation, a database comprising 80 vulnerable code snippets sourced from Google Android Security Bulletins is utilized. The evaluation process involves recording the outcomes of vulnerability fixes generated by each LLM-based tool. Two distinct evaluation techniques are employed: first, the calculation of BLEU scores to quantify the syntactic and semantic correctness of the repairs, and second, manual human evaluation for a more nuanced assessment. The comparison between the actual fixes and those generated by the LLM-based tools aims to highlight their efficacy in addressing security vulnerabilities. Results from the evaluation provide insights into the strengths and limitations of each LLM-based model concerning syntactic and semantic accuracy. The study reveals instances where the models excel in producing effective repairs and identifies areas for improvement. The combination of automated evaluation metrics and human assessment adds depth to the analysis, enhancing the reliability of the findings. In conclusion, this thesis contributes to the understanding of LLM-based tools' capabilities in automating Android security vulnerability repairs. The generated suggestions and results serve as valuable guidance for developers and researchers in leveraging these tools effectively, ultimately advancing the state of automated security practices in Android application development.
The ever-growing landscape of Android applications necessitates robust security mechanisms to mitigate potential vulnerabilities. This thesis presents a comprehensive evaluation of three Language Model (LLM)-based tools—ChatGPT, Google Bard, and Android Studio Bot—for automated Android security vulnerability repair. To ensure the authenticity of the evaluation, a database comprising 80 vulnerable code snippets sourced from Google Android Security Bulletins is utilized. The evaluation process involves recording the outcomes of vulnerability fixes generated by each LLM-based tool. Two distinct evaluation techniques are employed: first, the calculation of BLEU scores to quantify the syntactic and semantic correctness of the repairs, and second, manual human evaluation for a more nuanced assessment. The comparison between the actual fixes and those generated by the LLM-based tools aims to highlight their efficacy in addressing security vulnerabilities. Results from the evaluation provide insights into the strengths and limitations of each LLM-based model concerning syntactic and semantic accuracy. The study reveals instances where the models excel in producing effective repairs and identifies areas for improvement. The combination of automated evaluation metrics and human assessment adds depth to the analysis, enhancing the reliability of the findings. In conclusion, this thesis contributes to the understanding of LLM-based tools' capabilities in automating Android security vulnerability repairs. The generated suggestions and results serve as valuable guidance for developers and researchers in leveraging these tools effectively, ultimately advancing the state of automated security practices in Android application development.
Valutazione di Strumenti Basati su LLM per il Fixing Automatico di Vulnerabilità di Sicurezza in Applicazioni Android
AHMED, SAAD
2023/2024
Abstract
The ever-growing landscape of Android applications necessitates robust security mechanisms to mitigate potential vulnerabilities. This thesis presents a comprehensive evaluation of three Language Model (LLM)-based tools—ChatGPT, Google Bard, and Android Studio Bot—for automated Android security vulnerability repair. To ensure the authenticity of the evaluation, a database comprising 80 vulnerable code snippets sourced from Google Android Security Bulletins is utilized. The evaluation process involves recording the outcomes of vulnerability fixes generated by each LLM-based tool. Two distinct evaluation techniques are employed: first, the calculation of BLEU scores to quantify the syntactic and semantic correctness of the repairs, and second, manual human evaluation for a more nuanced assessment. The comparison between the actual fixes and those generated by the LLM-based tools aims to highlight their efficacy in addressing security vulnerabilities. Results from the evaluation provide insights into the strengths and limitations of each LLM-based model concerning syntactic and semantic accuracy. The study reveals instances where the models excel in producing effective repairs and identifies areas for improvement. The combination of automated evaluation metrics and human assessment adds depth to the analysis, enhancing the reliability of the findings. In conclusion, this thesis contributes to the understanding of LLM-based tools' capabilities in automating Android security vulnerability repairs. The generated suggestions and results serve as valuable guidance for developers and researchers in leveraging these tools effectively, ultimately advancing the state of automated security practices in Android application development.File | Dimensione | Formato | |
---|---|---|---|
Ahmed_Saad.pdf
accesso aperto
Dimensione
751.84 kB
Formato
Adobe PDF
|
751.84 kB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/64049