Assessing LLMs as Network Administrators: Agent-Orchestrated Pipelines vs. Direct Querying

Modern networks are becoming increasingly complex, creating opportunities for Large Language Models (LLMs) to assist network administrators with routine tasks and troubleshooting. However, we lack proper methods to evaluate how well these models actually perform in real network environments. Without standardized evaluation frameworks, it remains unclear how effectively different LLMs can handle network administration tasks and which interaction strategies yield the best results. This thesis addresses this gap by developing a comprehensive evaluation framework specifically designed to assess LLMs in network administration contexts. The framework features automated ground-truth generation, comparative analysis across diverse network environments, and systematic evaluation of both direct prompting and agent-based approaches using commercial and local LLMs. Through standardized network management scenarios, this work establishes performance baselines across different model types and interaction strategies, while identifying key challenges in applying LLMs to network administration tasks. The research contributes a reproducible evaluation methodology that provides foundational benchmarks for future AI-driven network management research. Our evaluation of eight commercial and local LLMs across standardized network scenarios reveals that GPT models achieve over 90\% accuracy in network administration tasks, significantly outperforming local models like Qwen and Mistral which averaged below 50\% accuracy. The results demonstrate that commercial models with agent-based approaches provide the most reliable performance for complex network troubleshooting, though at the cost of increased processing time.

Assessing LLMs as Network Administrators: Agent-Orchestrated Pipelines vs. Direct Querying

SALADINO, DAVIDE

2024/2025

Abstract

Modern networks are becoming increasingly complex, creating opportunities for Large Language Models (LLMs) to assist network administrators with routine tasks and troubleshooting. However, we lack proper methods to evaluate how well these models actually perform in real network environments. Without standardized evaluation frameworks, it remains unclear how effectively different LLMs can handle network administration tasks and which interaction strategies yield the best results. This thesis addresses this gap by developing a comprehensive evaluation framework specifically designed to assess LLMs in network administration contexts. The framework features automated ground-truth generation, comparative analysis across diverse network environments, and systematic evaluation of both direct prompting and agent-based approaches using commercial and local LLMs. Through standardized network management scenarios, this work establishes performance baselines across different model types and interaction strategies, while identifying key challenges in applying LLMs to network administration tasks. The research contributes a reproducible evaluation methodology that provides foundational benchmarks for future AI-driven network management research. Our evaluation of eight commercial and local LLMs across standardized network scenarios reveals that GPT models achieve over 90\% accuracy in network administration tasks, significantly outperforming local models like Qwen and Mistral which averaged below 50\% accuracy. The results demonstrate that commercial models with agent-based approaches provide the most reliable performance for complex network troubleshooting, though at the cost of increased processing time.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				INGEGNERIA INFORMATICA Laurea di Primo Livello (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Assessing LLMs as Network Administrators: Agent-Orchestrated Pipelines vs. Direct Querying
			
	Abstract in italiano
	
				Modern networks are becoming increasingly complex, creating opportunities for Large Language Models (LLMs) to assist network administrators with routine tasks and troubleshooting. However, we lack proper methods to evaluate how well these models actually perform in real network environments. Without standardized evaluation frameworks, it remains unclear how effectively different LLMs can handle network administration tasks and which interaction strategies yield the best results.

This thesis addresses this gap by developing a comprehensive evaluation framework specifically designed to assess LLMs in network administration contexts. The framework features automated ground-truth generation, comparative analysis across diverse network environments, and systematic evaluation of both direct prompting and agent-based approaches using commercial and local LLMs. Through standardized network management scenarios, this work establishes performance baselines across different model types and interaction strategies, while identifying key challenges in applying LLMs to network administration tasks. The research contributes a reproducible evaluation methodology that provides foundational benchmarks for future AI-driven network management research.

Our evaluation of eight commercial and local LLMs across standardized network scenarios reveals that GPT models achieve over 90\% accuracy in network administration tasks, significantly outperforming local models like Qwen and Mistral which averaged below 50\% accuracy. The results demonstrate that commercial models with agent-based approaches provide the most reliable performance for complex network troubleshooting, though at the cost of increased processing time.
			
	Parola chiave
	
				LLM
Networking
Agent
			
	Relatore
	
				PAJOLA, LUCA
			
	Appare nelle tipologie:
	
				Lauree triennali

File in questo prodotto:

File	Dimensione	Formato
Saladino_Davide.pdf accesso aperto Dimensione 3.28 MB Formato Adobe PDF Visualizza/Apri	3.28 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/92221