This thesis investigates the vulnerabilities of Large Language Models (LLMs) when integrated with Retrieval-Augmented Generation (RAG) systems. We particularly focus on the risks posed by malicious data poisoning. RAG systems are increasingly used to enhance LLM performance by incorporating external knowledge bases; however, there is a growing concern about their susceptibility to adversarial attacks. In this research, we specifically explore how poisoning vector databases, an essential component of RAG systems, can compromise LLM-generated responses, leading to inaccuracies. In this study, we employed a methodology that involves injecting misleading data into vector databases and assessing the performance of several baseline and fine-tuned LLMs, including Llama-2-7B-chat-hf, Mistral-7B-Instruct-v0.2, and Llama-13B-chat-hf to explore scalability on a larger version. We fine-tuned these baseline models first on a general dataset and then on a more specialized dataset focused on physics, as the task involves answering questions mostly related to current topics in recent physics papers. The results show significant insights. While the fine-tuned versions of Llama-2-7B-chat-hf and Mistral-7B-Instruct-v0.2 performed better when exposed to benign databases compared to their baseline counterparts, they did not demonstrate increased resilience against misleading information. For instance, the fine-tuned Llama-2-7B-chat-hf demonstrated the best overall performance; however, the fine-tuned Mistral 7B-Instruct-v0.2 model showed the worst performance when exposed to poisoned data, with a 9% drop in accuracy compared to its performance on benign data. Additionally, even larger models like Llama-2-13B-chat-hf struggled with the same issue, showing a similar decrease in accuracy as the smaller models. These findings suggest that fine-tuning, especially with datasets that do not thoroughly address the model’s knowledge gaps, does not uniformly enhance resilience or robustness against misinformation. Moreover, it should be noted that LLMs’ tendency to hallucinate and generate incorrect information further complicates their ability to provide accurate responses. This research highlights the need for more advanced strategies to detect and mitigate the impact of misinformation in RAG-based systems, as well as to account for the potential of models to hallucinate, which can lead to the generation of even more inaccurate responses. Overall, in this thesis, we contribute to the security of AI by identifying critical vulnerabilities in RAG-integrated LLMs and underlining the mandatory need for strong strategies to ensure reliability and safety for AI applications, particularly in sensitive environments where the precision and integrity of information are crucial.

This thesis investigates the vulnerabilities of Large Language Models (LLMs) when integrated with Retrieval-Augmented Generation (RAG) systems. We particularly focus on the risks posed by malicious data poisoning. RAG systems are increasingly used to enhance LLM performance by incorporating external knowledge bases; however, there is a growing concern about their susceptibility to adversarial attacks. In this research, we specifically explore how poisoning vector databases, an essential component of RAG systems, can compromise LLM-generated responses, leading to inaccuracies. In this study, we employed a methodology that involves injecting misleading data into vector databases and assessing the performance of several baseline and fine-tuned LLMs, including Llama-2-7B-chat-hf, Mistral-7B-Instruct-v0.2, and Llama-13B-chat-hf to explore scalability on a larger version. We fine-tuned these baseline models first on a general dataset and then on a more specialized dataset focused on physics, as the task involves answering questions mostly related to current topics in recent physics papers. The results show significant insights. While the fine-tuned versions of Llama-2-7B-chat-hf and Mistral-7B-Instruct-v0.2 performed better when exposed to benign databases compared to their baseline counterparts, they did not demonstrate increased resilience against misleading information. For instance, the fine-tuned Llama-2-7B-chat-hf demonstrated the best overall performance; however, the fine-tuned Mistral 7B-Instruct-v0.2 model showed the worst performance when exposed to poisoned data, with a 9% drop in accuracy compared to its performance on benign data. Additionally, even larger models like Llama-2-13B-chat-hf struggled with the same issue, showing a similar decrease in accuracy as the smaller models. These findings suggest that fine-tuning, especially with datasets that do not thoroughly address the model’s knowledge gaps, does not uniformly enhance resilience or robustness against misinformation. Moreover, it should be noted that LLMs’ tendency to hallucinate and generate incorrect information further complicates their ability to provide accurate responses. This research highlights the need for more advanced strategies to detect and mitigate the impact of misinformation in RAG-based systems, as well as to account for the potential of models to hallucinate, which can lead to the generation of even more inaccurate responses. Overall, in this thesis, we contribute to the security of AI by identifying critical vulnerabilities in RAG-integrated LLMs and underlining the mandatory need for strong strategies to ensure reliability and safety for AI applications, particularly in sensitive environments where the precision and integrity of information are crucial.

Security in Machine Learning: Exposing LLM Vulnerabilities through Poisoned Vector Databases in RAG-based System

DARYABAR, NIMA
2023/2024

Abstract

This thesis investigates the vulnerabilities of Large Language Models (LLMs) when integrated with Retrieval-Augmented Generation (RAG) systems. We particularly focus on the risks posed by malicious data poisoning. RAG systems are increasingly used to enhance LLM performance by incorporating external knowledge bases; however, there is a growing concern about their susceptibility to adversarial attacks. In this research, we specifically explore how poisoning vector databases, an essential component of RAG systems, can compromise LLM-generated responses, leading to inaccuracies. In this study, we employed a methodology that involves injecting misleading data into vector databases and assessing the performance of several baseline and fine-tuned LLMs, including Llama-2-7B-chat-hf, Mistral-7B-Instruct-v0.2, and Llama-13B-chat-hf to explore scalability on a larger version. We fine-tuned these baseline models first on a general dataset and then on a more specialized dataset focused on physics, as the task involves answering questions mostly related to current topics in recent physics papers. The results show significant insights. While the fine-tuned versions of Llama-2-7B-chat-hf and Mistral-7B-Instruct-v0.2 performed better when exposed to benign databases compared to their baseline counterparts, they did not demonstrate increased resilience against misleading information. For instance, the fine-tuned Llama-2-7B-chat-hf demonstrated the best overall performance; however, the fine-tuned Mistral 7B-Instruct-v0.2 model showed the worst performance when exposed to poisoned data, with a 9% drop in accuracy compared to its performance on benign data. Additionally, even larger models like Llama-2-13B-chat-hf struggled with the same issue, showing a similar decrease in accuracy as the smaller models. These findings suggest that fine-tuning, especially with datasets that do not thoroughly address the model’s knowledge gaps, does not uniformly enhance resilience or robustness against misinformation. Moreover, it should be noted that LLMs’ tendency to hallucinate and generate incorrect information further complicates their ability to provide accurate responses. This research highlights the need for more advanced strategies to detect and mitigate the impact of misinformation in RAG-based systems, as well as to account for the potential of models to hallucinate, which can lead to the generation of even more inaccurate responses. Overall, in this thesis, we contribute to the security of AI by identifying critical vulnerabilities in RAG-integrated LLMs and underlining the mandatory need for strong strategies to ensure reliability and safety for AI applications, particularly in sensitive environments where the precision and integrity of information are crucial.
2023
Security in Machine Learning: Exposing LLM Vulnerabilities through Poisoned Vector Databases in RAG-based System
This thesis investigates the vulnerabilities of Large Language Models (LLMs) when integrated with Retrieval-Augmented Generation (RAG) systems. We particularly focus on the risks posed by malicious data poisoning. RAG systems are increasingly used to enhance LLM performance by incorporating external knowledge bases; however, there is a growing concern about their susceptibility to adversarial attacks. In this research, we specifically explore how poisoning vector databases, an essential component of RAG systems, can compromise LLM-generated responses, leading to inaccuracies. In this study, we employed a methodology that involves injecting misleading data into vector databases and assessing the performance of several baseline and fine-tuned LLMs, including Llama-2-7B-chat-hf, Mistral-7B-Instruct-v0.2, and Llama-13B-chat-hf to explore scalability on a larger version. We fine-tuned these baseline models first on a general dataset and then on a more specialized dataset focused on physics, as the task involves answering questions mostly related to current topics in recent physics papers. The results show significant insights. While the fine-tuned versions of Llama-2-7B-chat-hf and Mistral-7B-Instruct-v0.2 performed better when exposed to benign databases compared to their baseline counterparts, they did not demonstrate increased resilience against misleading information. For instance, the fine-tuned Llama-2-7B-chat-hf demonstrated the best overall performance; however, the fine-tuned Mistral 7B-Instruct-v0.2 model showed the worst performance when exposed to poisoned data, with a 9% drop in accuracy compared to its performance on benign data. Additionally, even larger models like Llama-2-13B-chat-hf struggled with the same issue, showing a similar decrease in accuracy as the smaller models. These findings suggest that fine-tuning, especially with datasets that do not thoroughly address the model’s knowledge gaps, does not uniformly enhance resilience or robustness against misinformation. Moreover, it should be noted that LLMs’ tendency to hallucinate and generate incorrect information further complicates their ability to provide accurate responses. This research highlights the need for more advanced strategies to detect and mitigate the impact of misinformation in RAG-based systems, as well as to account for the potential of models to hallucinate, which can lead to the generation of even more inaccurate responses. Overall, in this thesis, we contribute to the security of AI by identifying critical vulnerabilities in RAG-integrated LLMs and underlining the mandatory need for strong strategies to ensure reliability and safety for AI applications, particularly in sensitive environments where the precision and integrity of information are crucial.
LLMs
RAG Systems
Security in ML
Adversarial Attacks
Vector Databases
File in questo prodotto:
File Dimensione Formato  
Daryabar_Nima.pdf

accesso aperto

Dimensione 2.61 MB
Formato Adobe PDF
2.61 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/71090