The integration of Generative AI into Intelligent Process Automation is currently hindered by structural barriers regarding the economic sustainability of cloud-based solutions and compliance risks regarding the transfer of sensitive data to third-party providers. This thesis investigates the technical and economic feasibility of a paradigm shift towards "on-premise" inference, utilizing Small Language Models (SLMs) executed exclusively on standard CPU infrastructure, without the aid of dedicated GPU accelerators. Through a real-world case study at Data4Prime S.r.l., this work analyzes the performance of quantized models in two distinct application scenarios: code generation and Knowledge Retrieval on proprietary technical documentation via a Retrieval-Augmented Generation (RAG) architecture. The research aims to provide a critical assessment of the trade-offs involved in deploying local AI strategies on standard hardware.

The integration of Generative AI into Intelligent Process Automation is currently hindered by structural barriers regarding the economic sustainability of cloud-based solutions and compliance risks regarding the transfer of sensitive data to third-party providers. This thesis investigates the technical and economic feasibility of a paradigm shift towards "on-premise" inference, utilizing Small Language Models (SLMs) executed exclusively on standard CPU infrastructure, without the aid of dedicated GPU accelerators. Through a real-world case study at Data4Prime S.r.l., this work analyzes the performance of quantized models in two distinct application scenarios: code generation and Knowledge Retrieval on proprietary technical documentation via a Retrieval-Augmented Generation (RAG) architecture. The research aims to provide a critical assessment of the trade-offs involved in deploying local AI strategies on standard hardware.

Efficient On-Premise AI: Benchmarking Quantized SLMs on CPU-only Infrastructure

GASTALDON, SIMONE
2025/2026

Abstract

The integration of Generative AI into Intelligent Process Automation is currently hindered by structural barriers regarding the economic sustainability of cloud-based solutions and compliance risks regarding the transfer of sensitive data to third-party providers. This thesis investigates the technical and economic feasibility of a paradigm shift towards "on-premise" inference, utilizing Small Language Models (SLMs) executed exclusively on standard CPU infrastructure, without the aid of dedicated GPU accelerators. Through a real-world case study at Data4Prime S.r.l., this work analyzes the performance of quantized models in two distinct application scenarios: code generation and Knowledge Retrieval on proprietary technical documentation via a Retrieval-Augmented Generation (RAG) architecture. The research aims to provide a critical assessment of the trade-offs involved in deploying local AI strategies on standard hardware.
2025
Efficient On-Premise AI: Benchmarking Quantized SLMs on CPU-only Infrastructure
The integration of Generative AI into Intelligent Process Automation is currently hindered by structural barriers regarding the economic sustainability of cloud-based solutions and compliance risks regarding the transfer of sensitive data to third-party providers. This thesis investigates the technical and economic feasibility of a paradigm shift towards "on-premise" inference, utilizing Small Language Models (SLMs) executed exclusively on standard CPU infrastructure, without the aid of dedicated GPU accelerators. Through a real-world case study at Data4Prime S.r.l., this work analyzes the performance of quantized models in two distinct application scenarios: code generation and Knowledge Retrieval on proprietary technical documentation via a Retrieval-Augmented Generation (RAG) architecture. The research aims to provide a critical assessment of the trade-offs involved in deploying local AI strategies on standard hardware.
Small Language Model
On-Premise Inference
RAG
File in questo prodotto:
File Dimensione Formato  
Master thesis SG.pdf

Accesso riservato

Dimensione 1.88 MB
Formato Adobe PDF
1.88 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108226