Development, Optimization, and Comparative Evaluation of Open-Source Language Models for Proprietary SQL-Based Ticketing Systems

The increasing complexity of ticket management within enterprise support systems necessitates innovative solutions that enhance operational efficiency and enable intuitive data interaction. This thesis presents the design, development, and evaluation of a natural language-to-SQL system for querying proprietary ticketing databases using open-source Large Language Models (LLMs). The system empowers non-technical users to retrieve actionable insights from structured data by converting natural language queries into executable SQL commands. The solution integrates open-source models, including Llama3.2:1b, Llama2 (7b, 13b), code-llama (7b, 13b), mistral:7b and mistral-nemo:12b. It is enhanced by prompt engineering and schema-aware query generation. The underlying database reflects Xidera’s proprietary ticketing system, containing over 12,000 tickets, 3,400 shipment records, and 350 hardware returns. The system is engineered to work with Italian and English language to deliver accurate and contextually appropriate responses while operating under hardware constraints, addressed through efficient data handling and model selection. A key contribution of this work is a structured evaluation framework that assesses each model’s performance across five dimensions: SQL condition match, output value accuracy, column match accuracy, table match accuracy and row count validation. A benchmark set of hand-crafted questions in dual language English and Italian, based on the company database, was designed to evaluate a broad spectrum of query types, including filters, joins, aggregations, and time-based conditions. Through detailed analysis, common failure patterns such as incorrect JOIN logic, improper timestamp handling, and function misuse were identified and mitigated via prompt refinement and example-driven learning. Comparative evaluation across LLM models revealed that prompt design, schema integration, and system constraints significantly influence the quality of SQL generation. Overall, the results demonstrate that with proper optimization, open-source LLMs can serve as scalable, privacy-preserving alternatives to commercial solutions for enterprise-level ticket management systems.

Development, Optimization, and Comparative Evaluation of Open-Source Language Models for Proprietary SQL-Based Ticketing Systems

POTHUGANTI, SHIREESHA

2024/2025

Abstract

The increasing complexity of ticket management within enterprise support systems necessitates innovative solutions that enhance operational efficiency and enable intuitive data interaction. This thesis presents the design, development, and evaluation of a natural language-to-SQL system for querying proprietary ticketing databases using open-source Large Language Models (LLMs). The system empowers non-technical users to retrieve actionable insights from structured data by converting natural language queries into executable SQL commands. The solution integrates open-source models, including Llama3.2:1b, Llama2 (7b, 13b), code-llama (7b, 13b), mistral:7b and mistral-nemo:12b. It is enhanced by prompt engineering and schema-aware query generation. The underlying database reflects Xidera’s proprietary ticketing system, containing over 12,000 tickets, 3,400 shipment records, and 350 hardware returns. The system is engineered to work with Italian and English language to deliver accurate and contextually appropriate responses while operating under hardware constraints, addressed through efficient data handling and model selection. A key contribution of this work is a structured evaluation framework that assesses each model’s performance across five dimensions: SQL condition match, output value accuracy, column match accuracy, table match accuracy and row count validation. A benchmark set of hand-crafted questions in dual language English and Italian, based on the company database, was designed to evaluate a broad spectrum of query types, including filters, joins, aggregations, and time-based conditions. Through detailed analysis, common failure patterns such as incorrect JOIN logic, improper timestamp handling, and function misuse were identified and mitigated via prompt refinement and example-driven learning. Comparative evaluation across LLM models revealed that prompt design, schema integration, and system constraints significantly influence the quality of SQL generation. Overall, the results demonstrate that with proper optimization, open-source LLMs can serve as scalable, privacy-preserving alternatives to commercial solutions for enterprise-level ticket management systems.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Development, Optimization, and Comparative Evaluation of Open-Source Language Models for Proprietary SQL-Based Ticketing Systems
			
	Abstract in italiano
	
				The increasing complexity of ticket management within enterprise support systems necessitates innovative solutions that enhance operational efficiency and enable intuitive data interaction. This thesis presents the design, development, and evaluation of a natural language-to-SQL system for querying proprietary ticketing databases using open-source Large Language Models (LLMs). The system empowers non-technical users to retrieve actionable insights from structured data by converting natural language queries into executable SQL commands.
The solution integrates open-source models, including Llama3.2:1b, Llama2 (7b, 13b), code-llama (7b, 13b), mistral:7b and mistral-nemo:12b. It is enhanced by prompt engineering and schema-aware query generation. The underlying database reflects Xidera’s proprietary ticketing system, containing over 12,000 tickets, 3,400 shipment records, and 350 hardware returns. The system is engineered to work with Italian and English language to deliver accurate and contextually appropriate responses while operating under hardware constraints, addressed through efficient data handling and model selection. 
A key contribution of this work is a structured evaluation framework that assesses each model’s performance across five dimensions: SQL condition match, output value accuracy, column match accuracy, table match accuracy and row count validation. A benchmark set of hand-crafted questions in dual language English and Italian, based on the company database, was designed to evaluate a broad spectrum of query types, including filters, joins, aggregations, and time-based conditions. 
Through detailed analysis, common failure patterns such as incorrect JOIN logic, improper timestamp handling, and function misuse were identified and mitigated via prompt refinement and example-driven learning. Comparative evaluation across LLM models revealed that prompt design, schema integration, and system constraints significantly influence the quality of SQL generation.
Overall, the results demonstrate that with proper optimization, open-source LLMs can serve as scalable, privacy-preserving alternatives to commercial solutions for enterprise-level ticket management systems.
			
	Parola chiave
	
				Ticket Management
SQL Generation
Open Source LLM
Prompt Engineering
			
	Relatore
	
				RIGONI, DAVIDE
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Pothuganti_Shireesha.pdf Accesso riservato Dimensione 2.43 MB Formato Adobe PDF	2.43 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89835