Android applications frequently embed native C/C++ libraries accessed via the Java Native Interface (JNI), introducing memory-safety risks that may compromise the entire application despite the sandbox. Large-scale fuzzing systems such as POIROT generate thousands of crashes for these libraries, but their triage remains manual, slow, and difficult to scale. To improve the scalability of crash triage for Android native libraries, this thesis quantifies whether an Large Language Model (LLM) equipped with Model Context Protocol (MCP) for retrieving code context, can reliably classify fuzzing-generated crashes and distinguish benign faults from real vulnerabilities. A structured, automated triage workflow is developed by combining LLM with reverse-engineering tools exposed through the MCP. Crash artefacts generated by POIROT, a state-of-the-art system that automatically synthesises fuzzing harnesses for Android native libraries and produces crash outputs such as stack traces, are enriched with contextual evidence retrieved from Jadx (Java/DEX decompilation) and Ghidra (native decompilation). The LLM analyses each crash following a system-prompt that enforces structured reasoning and a strict JSON schema, leveraging a filtered Java-to-native call graph and a map of crash-relevant native methods to produce a grounded vulnerability assessment including severity, CWE mapping, evidence items, and exploitability indicators. Across 137 crashes from 80 real-world applications, the system achieves an overall accuracy of 66%, with a low false-negative rates (3–5%). When Java-side context is available through a filtered Java Call Graph (JCG), accuracy increases to 77% and precision more than doubles, confirming the importance of cross-layer information in reducing over-approximation. A detailed case study on TP-LINK’s tpCamera reproduces and correctly characterises the real vulnerability later assigned CVE-2023-30273, demonstrating that the system can recover expert-level reasoning patterns using structured evidence. The findings show that LLM-based crash triage, when grounded through MCP-mediated retrieval of Java and native context, provides a practical and scalable first-line vulnerability assessment mechanism for Android native libraries. While not a substitute for manual auditing, the workflow improves consistency, reduces analyst effort, and offers structured, evidence-driven starting points for further security investigation.
LLM-Based Triaging of Vulnerabilities in Android Native Libraries
BUSATO, NICOLA
2024/2025
Abstract
Android applications frequently embed native C/C++ libraries accessed via the Java Native Interface (JNI), introducing memory-safety risks that may compromise the entire application despite the sandbox. Large-scale fuzzing systems such as POIROT generate thousands of crashes for these libraries, but their triage remains manual, slow, and difficult to scale. To improve the scalability of crash triage for Android native libraries, this thesis quantifies whether an Large Language Model (LLM) equipped with Model Context Protocol (MCP) for retrieving code context, can reliably classify fuzzing-generated crashes and distinguish benign faults from real vulnerabilities. A structured, automated triage workflow is developed by combining LLM with reverse-engineering tools exposed through the MCP. Crash artefacts generated by POIROT, a state-of-the-art system that automatically synthesises fuzzing harnesses for Android native libraries and produces crash outputs such as stack traces, are enriched with contextual evidence retrieved from Jadx (Java/DEX decompilation) and Ghidra (native decompilation). The LLM analyses each crash following a system-prompt that enforces structured reasoning and a strict JSON schema, leveraging a filtered Java-to-native call graph and a map of crash-relevant native methods to produce a grounded vulnerability assessment including severity, CWE mapping, evidence items, and exploitability indicators. Across 137 crashes from 80 real-world applications, the system achieves an overall accuracy of 66%, with a low false-negative rates (3–5%). When Java-side context is available through a filtered Java Call Graph (JCG), accuracy increases to 77% and precision more than doubles, confirming the importance of cross-layer information in reducing over-approximation. A detailed case study on TP-LINK’s tpCamera reproduces and correctly characterises the real vulnerability later assigned CVE-2023-30273, demonstrating that the system can recover expert-level reasoning patterns using structured evidence. The findings show that LLM-based crash triage, when grounded through MCP-mediated retrieval of Java and native context, provides a practical and scalable first-line vulnerability assessment mechanism for Android native libraries. While not a substitute for manual auditing, the workflow improves consistency, reduces analyst effort, and offers structured, evidence-driven starting points for further security investigation.| File | Dimensione | Formato | |
|---|---|---|---|
|
Busato_Nicola.pdf
accesso aperto
Dimensione
1.26 MB
Formato
Adobe PDF
|
1.26 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/101990