With the evolution of language models and the optimization of artificial intelligence architectures, the need to balance performance and computational efficiency has emerged. This thesis focuses on the comparison of various Small Language Models (SLMs), analyzing their differences in terms of architecture, training strategies, and performance. In particular, the Gemma 2, Phi 3, and Llama 3 models are examined, highlighting the solutions adopted to optimize efficiency, such as compression techniques and knowledge distillation. Through a comparative analysis based on standard benchmarks, the strengths and limitations of each model are evaluated, providing an overview of their potential and impact in the field of natural language processing.
Con l’evoluzione dei modelli di linguaggio e l’ottimizzazione delle architetture di intelligenza artificiale, è emersa la necessità di bilanciare prestazioni ed efficienza computazionale. Questa tesi si concentra sul confronto tra diversi Small Language Models (SLM), analizzandone le differenze in termini di architettura, strategie di addestramento e prestazioni. Vengono esaminati in particolare i modelli Gemma 2, Phi 3 e Llama 3, evidenziando le soluzioni adottate per ottimizzare l’efficienza, come le tecniche di compressione e la knowledge distillation. Attraverso un’analisi comparativa basata su benchmark standard, si valutano i punti di forza e le limitazioni di ciascun modello, fornendo una panoramica sulle loro potenzialità e sull’impatto nel campo dell’elaborazione del linguaggio naturale.
Modelli linguistici di piccole dimensioni: architetture, strategie di training e prestazioni
BROCCO, GRETA
2024/2025
Abstract
With the evolution of language models and the optimization of artificial intelligence architectures, the need to balance performance and computational efficiency has emerged. This thesis focuses on the comparison of various Small Language Models (SLMs), analyzing their differences in terms of architecture, training strategies, and performance. In particular, the Gemma 2, Phi 3, and Llama 3 models are examined, highlighting the solutions adopted to optimize efficiency, such as compression techniques and knowledge distillation. Through a comparative analysis based on standard benchmarks, the strengths and limitations of each model are evaluated, providing an overview of their potential and impact in the field of natural language processing.| File | Dimensione | Formato | |
|---|---|---|---|
|
Brocco_Greta.pdf
accesso aperto
Dimensione
1.24 MB
Formato
Adobe PDF
|
1.24 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/89697