The purpose of the thesis is to develop a system for extracting information from a collection of thesis papers in response to text queries, with a strong emphasis on operating in a local environment and upholding the confidentiality of the documents processed. This project involves employing a large language model, which would be fine-tuned on a specific dataset of thesis papers stored locally. The goal is to enable the model to retrieve relevant information and provide extracted answers, citing the sources (i.e., the specific thesis papers) where this information is found, without transmitting any data externally. This advanced document retrieval system is designed to understand and process complex academic texts (thesis papers) and respond accurately to user queries by pinpointing and extracting pertinent information from these documents—all while running entirely on local infrastructure. This local deployment ensures that all processing is contained within a secure environment, maintaining the integrity and confidentiality of the thesis papers, which are of sensitive nature.
The purpose of the thesis is to develop a system for extracting information from a collection of thesis papers in response to text queries, with a strong emphasis on operating in a local environment and upholding the confidentiality of the documents processed. This project involves employing a large language model, which would be fine-tuned on a specific dataset of thesis papers stored locally. The goal is to enable the model to retrieve relevant information and provide extracted answers, citing the sources (i.e., the specific thesis papers) where this information is found, without transmitting any data externally. This advanced document retrieval system is designed to understand and process complex academic texts (thesis papers) and respond accurately to user queries by pinpointing and extracting pertinent information from these documents—all while running entirely on local infrastructure. This local deployment ensures that all processing is contained within a secure environment, maintaining the integrity and confidentiality of the thesis papers, which are of sensitive nature.
Building the Next-Gen Search Engine with Large Language Models & Retrieval Augmented Generation
BEZZINA, MALEK
2023/2024
Abstract
The purpose of the thesis is to develop a system for extracting information from a collection of thesis papers in response to text queries, with a strong emphasis on operating in a local environment and upholding the confidentiality of the documents processed. This project involves employing a large language model, which would be fine-tuned on a specific dataset of thesis papers stored locally. The goal is to enable the model to retrieve relevant information and provide extracted answers, citing the sources (i.e., the specific thesis papers) where this information is found, without transmitting any data externally. This advanced document retrieval system is designed to understand and process complex academic texts (thesis papers) and respond accurately to user queries by pinpointing and extracting pertinent information from these documents—all while running entirely on local infrastructure. This local deployment ensures that all processing is contained within a secure environment, maintaining the integrity and confidentiality of the thesis papers, which are of sensitive nature.File | Dimensione | Formato | |
---|---|---|---|
Thesis Malek Bezzina.pdf
accesso riservato
Dimensione
988.4 kB
Formato
Adobe PDF
|
988.4 kB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/72841