The purpose of the thesis is to develop a system for extracting information from a collection of thesis papers in response to text queries, with a strong emphasis on operating in a local environment and upholding the confidentiality of the documents processed. This project involves employing a large language model, which would be fine-tuned on a specific dataset of thesis papers stored locally. The goal is to enable the model to retrieve relevant information and provide extracted answers, citing the sources (i.e., the specific thesis papers) where this information is found, without transmitting any data externally. This advanced document retrieval system is designed to understand and process complex academic texts (thesis papers) and respond accurately to user queries by pinpointing and extracting pertinent information from these documents—all while running entirely on local infrastructure. This local deployment ensures that all processing is contained within a secure environment, maintaining the integrity and confidentiality of the thesis papers, which are of sensitive nature.

The purpose of the thesis is to develop a system for extracting information from a collection of thesis papers in response to text queries, with a strong emphasis on operating in a local environment and upholding the confidentiality of the documents processed. This project involves employing a large language model, which would be fine-tuned on a specific dataset of thesis papers stored locally. The goal is to enable the model to retrieve relevant information and provide extracted answers, citing the sources (i.e., the specific thesis papers) where this information is found, without transmitting any data externally. This advanced document retrieval system is designed to understand and process complex academic texts (thesis papers) and respond accurately to user queries by pinpointing and extracting pertinent information from these documents—all while running entirely on local infrastructure. This local deployment ensures that all processing is contained within a secure environment, maintaining the integrity and confidentiality of the thesis papers, which are of sensitive nature.

Building the Next-Gen Search Engine with Large Language Models & Retrieval Augmented Generation

BEZZINA, MALEK
2023/2024

Abstract

The purpose of the thesis is to develop a system for extracting information from a collection of thesis papers in response to text queries, with a strong emphasis on operating in a local environment and upholding the confidentiality of the documents processed. This project involves employing a large language model, which would be fine-tuned on a specific dataset of thesis papers stored locally. The goal is to enable the model to retrieve relevant information and provide extracted answers, citing the sources (i.e., the specific thesis papers) where this information is found, without transmitting any data externally. This advanced document retrieval system is designed to understand and process complex academic texts (thesis papers) and respond accurately to user queries by pinpointing and extracting pertinent information from these documents—all while running entirely on local infrastructure. This local deployment ensures that all processing is contained within a secure environment, maintaining the integrity and confidentiality of the thesis papers, which are of sensitive nature.
2023
Building the Next-Gen Search Engine with Large Language Models & Retrieval Augmented Generation
The purpose of the thesis is to develop a system for extracting information from a collection of thesis papers in response to text queries, with a strong emphasis on operating in a local environment and upholding the confidentiality of the documents processed. This project involves employing a large language model, which would be fine-tuned on a specific dataset of thesis papers stored locally. The goal is to enable the model to retrieve relevant information and provide extracted answers, citing the sources (i.e., the specific thesis papers) where this information is found, without transmitting any data externally. This advanced document retrieval system is designed to understand and process complex academic texts (thesis papers) and respond accurately to user queries by pinpointing and extracting pertinent information from these documents—all while running entirely on local infrastructure. This local deployment ensures that all processing is contained within a secure environment, maintaining the integrity and confidentiality of the thesis papers, which are of sensitive nature.
LLM
RAG
Data processing
File in questo prodotto:
File Dimensione Formato  
Thesis Malek Bezzina.pdf

accesso riservato

Dimensione 988.4 kB
Formato Adobe PDF
988.4 kB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/72841