LLM-Based Triaging of Vulnerabilities in Android Native Libraries

Android applications frequently embed native C/C++ libraries accessed via the Java Native Interface (JNI), introducing memory-safety risks that may compromise the entire application despite the sandbox. Large-scale fuzzing systems such as POIROT generate thousands of crashes for these libraries, but their triage remains manual, slow, and difficult to scale. To improve the scalability of crash triage for Android native libraries, this thesis quantifies whether an Large Language Model (LLM) equipped with Model Context Protocol (MCP) for retrieving code context, can reliably classify fuzzing-generated crashes and distinguish benign faults from real vulnerabilities. A structured, automated triage workflow is developed by combining LLM with reverse-engineering tools exposed through the MCP. Crash artefacts generated by POIROT, a state-of-the-art system that automatically synthesises fuzzing harnesses for Android native libraries and produces crash outputs such as stack traces, are enriched with contextual evidence retrieved from Jadx (Java/DEX decompilation) and Ghidra (native decompilation). The LLM analyses each crash following a system-prompt that enforces structured reasoning and a strict JSON schema, leveraging a filtered Java-to-native call graph and a map of crash-relevant native methods to produce a grounded vulnerability assessment including severity, CWE mapping, evidence items, and exploitability indicators. Across 137 crashes from 80 real-world applications, the system achieves an overall accuracy of 66%, with a low false-negative rates (3–5%). When Java-side context is available through a filtered Java Call Graph (JCG), accuracy increases to 77% and precision more than doubles, confirming the importance of cross-layer information in reducing over-approximation. A detailed case study on TP-LINK’s tpCamera reproduces and correctly characterises the real vulnerability later assigned CVE-2023-30273, demonstrating that the system can recover expert-level reasoning patterns using structured evidence. The findings show that LLM-based crash triage, when grounded through MCP-mediated retrieval of Java and native context, provides a practical and scalable first-line vulnerability assessment mechanism for Android native libraries. While not a substitute for manual auditing, the workflow improves consistency, reduces analyst effort, and offers structured, evidence-driven starting points for further security investigation.

LLM-Based Triaging of Vulnerabilities in Android Native Libraries

BUSATO, NICOLA

2024/2025

Abstract

Android applications frequently embed native C/C++ libraries accessed via the Java Native Interface (JNI), introducing memory-safety risks that may compromise the entire application despite the sandbox. Large-scale fuzzing systems such as POIROT generate thousands of crashes for these libraries, but their triage remains manual, slow, and difficult to scale. To improve the scalability of crash triage for Android native libraries, this thesis quantifies whether an Large Language Model (LLM) equipped with Model Context Protocol (MCP) for retrieving code context, can reliably classify fuzzing-generated crashes and distinguish benign faults from real vulnerabilities. A structured, automated triage workflow is developed by combining LLM with reverse-engineering tools exposed through the MCP. Crash artefacts generated by POIROT, a state-of-the-art system that automatically synthesises fuzzing harnesses for Android native libraries and produces crash outputs such as stack traces, are enriched with contextual evidence retrieved from Jadx (Java/DEX decompilation) and Ghidra (native decompilation). The LLM analyses each crash following a system-prompt that enforces structured reasoning and a strict JSON schema, leveraging a filtered Java-to-native call graph and a map of crash-relevant native methods to produce a grounded vulnerability assessment including severity, CWE mapping, evidence items, and exploitability indicators. Across 137 crashes from 80 real-world applications, the system achieves an overall accuracy of 66%, with a low false-negative rates (3–5%). When Java-side context is available through a filtered Java Call Graph (JCG), accuracy increases to 77% and precision more than doubles, confirming the importance of cross-layer information in reducing over-approximation. A detailed case study on TP-LINK’s tpCamera reproduces and correctly characterises the real vulnerability later assigned CVE-2023-30273, demonstrating that the system can recover expert-level reasoning patterns using structured evidence. The findings show that LLM-based crash triage, when grounded through MCP-mediated retrieval of Java and native context, provides a practical and scalable first-line vulnerability assessment mechanism for Android native libraries. While not a substitute for manual auditing, the workflow improves consistency, reduces analyst effort, and offers structured, evidence-driven starting points for further security investigation.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				CYBERSECURITY Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				LLM-Based Triaging of Vulnerabilities in Android Native Libraries
			
	Parola chiave
	
				Vulnerability Triage
LLM
Native Libraries
Android
			
	Relatore
	
				LOSIOUK, ELEONORA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Busato_Nicola.pdf accesso aperto Dimensione 1.26 MB Formato Adobe PDF Visualizza/Apri	1.26 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/101990