Building the Next-Gen Search Engine with Large Language Models & Retrieval Augmented Generation

The purpose of the thesis is to develop a system for extracting information from a collection of thesis papers in response to text queries, with a strong emphasis on operating in a local environment and upholding the confidentiality of the documents processed. This project involves employing a large language model, which would be fine-tuned on a specific dataset of thesis papers stored locally. The goal is to enable the model to retrieve relevant information and provide extracted answers, citing the sources (i.e., the specific thesis papers) where this information is found, without transmitting any data externally. This advanced document retrieval system is designed to understand and process complex academic texts (thesis papers) and respond accurately to user queries by pinpointing and extracting pertinent information from these documents—all while running entirely on local infrastructure. This local deployment ensures that all processing is contained within a secure environment, maintaining the integrity and confidentiality of the thesis papers, which are of sensitive nature.

Building the Next-Gen Search Engine with Large Language Models & Retrieval Augmented Generation

BEZZINA, MALEK

2023/2024

Abstract

The purpose of the thesis is to develop a system for extracting information from a collection of thesis papers in response to text queries, with a strong emphasis on operating in a local environment and upholding the confidentiality of the documents processed. This project involves employing a large language model, which would be fine-tuned on a specific dataset of thesis papers stored locally. The goal is to enable the model to retrieve relevant information and provide extracted answers, citing the sources (i.e., the specific thesis papers) where this information is found, without transmitting any data externally. This advanced document retrieval system is designed to understand and process complex academic texts (thesis papers) and respond accurately to user queries by pinpointing and extracting pertinent information from these documents—all while running entirely on local infrastructure. This local deployment ensures that all processing is contained within a secure environment, maintaining the integrity and confidentiality of the thesis papers, which are of sensitive nature.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Building the Next-Gen Search Engine with Large Language Models & Retrieval Augmented Generation
			
	Abstract in italiano
	
				The purpose of the thesis is to develop a system for extracting information from a
collection of thesis papers in response to text queries, with a strong emphasis on
operating in a local environment and upholding the confidentiality of the documents
processed. This project involves employing a large language model, which would be
fine-tuned on a specific dataset of thesis papers stored locally. The goal is to enable the
model to retrieve relevant information and provide extracted answers, citing the sources
(i.e., the specific thesis papers) where this information is found, without transmitting
any data externally.
This advanced document retrieval system is designed to understand and process
complex academic texts (thesis papers) and respond accurately to user queries by
pinpointing and extracting pertinent information from these documents—all while
running entirely on local infrastructure. This local deployment ensures that all
processing is contained within a secure environment, maintaining the integrity and
confidentiality of the thesis papers, which are of sensitive nature.
			
	Parola chiave
	
				LLM
RAG
Data processing
			
	Relatore
	
				SUSTO, GIAN ANTONIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Thesis Malek Bezzina.pdf Accesso riservato Dimensione 988.4 kB Formato Adobe PDF	988.4 kB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/72841