Reengineering the Galois System: Toward Reproducible and Generalizable LLM‑Based Structured Retrieval

In this thesis, we present the GaloisPy system, a software tool written in Python that re-implements the Galois system, a Java application developed by a team of researchers from the University of Basilicata in collaboration with the French research institute EURECOM for retrieving structured data from large language models, or LLMs, using SQL syntax. GaloisPy introduces several improvements and optimizations designed specifically to enhance its performance, such as including chat history management, dynamic Pydantic models for structured outputs, and prompt rewriting from SQL-like language to natural language, with the last being crucial for the system's accuracy. In this work, the results of various experiments conducted will be presented, comparing the performance of GaloisPy against the original Galois system using the GPT-4.1 nano model by OpenAI. These tests analyze both systems’ performance in different scenarios, including one where data is generated solely using LLM internal knowledge and another employing the Retrieval-Augmented Generation, or RAG, strategy. From the results obtained, it is evident that GaloisPy achieves significantly better performance compared to the original, as indicated by the metrics used. However, it also has some limitations, such as the ineffectiveness of the Key-Scan strategy compared to the Table-Scan, and challenges in managing complex queries, especially those involving JOIN clauses. Finally, in the RAG scenario, it was observed that using a reranking model, coupled with the small size of the LLM used, can reduce overall performance.

Reengineering the Galois System: Toward Reproducible and Generalizable LLM‑Based Structured Retrieval

CHEMELLO, FRANCESCO

2025/2026

Abstract

In this thesis, we present the GaloisPy system, a software tool written in Python that re-implements the Galois system, a Java application developed by a team of researchers from the University of Basilicata in collaboration with the French research institute EURECOM for retrieving structured data from large language models, or LLMs, using SQL syntax. GaloisPy introduces several improvements and optimizations designed specifically to enhance its performance, such as including chat history management, dynamic Pydantic models for structured outputs, and prompt rewriting from SQL-like language to natural language, with the last being crucial for the system's accuracy. In this work, the results of various experiments conducted will be presented, comparing the performance of GaloisPy against the original Galois system using the GPT-4.1 nano model by OpenAI. These tests analyze both systems’ performance in different scenarios, including one where data is generated solely using LLM internal knowledge and another employing the Retrieval-Augmented Generation, or RAG, strategy. From the results obtained, it is evident that GaloisPy achieves significantly better performance compared to the original, as indicated by the metrics used. However, it also has some limitations, such as the ineffectiveness of the Key-Scan strategy compared to the Table-Scan, and challenges in managing complex queries, especially those involving JOIN clauses. Finally, in the RAG scenario, it was observed that using a reranking model, coupled with the small size of the LLM used, can reduce overall performance.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				Reengineering the Galois System: Toward Reproducible and Generalizable LLM‑Based Structured Retrieval
			
	Parola chiave
	
				Structured retrieval
LLM retrieval
Python programming
			
	Relatore
	
				SILVELLO, GIANMARIA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Chemello_Francesco.pdf accesso aperto Dimensione 6.54 MB Formato Adobe PDF Visualizza/Apri	6.54 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106833