Large Language Models (LLMs) have taken the world by storm, creating a competitive market and compelling even established tech giants to join this rapidly evolving landscape. In the commercial sphere, well-known models like OpenAI’s ChatGPT dominate. Open-source models are either small and not very capable, or large and complex to operate at scale. Increasingly, there’s a demand for personalized LLMs, especially from companies seeking to customize models for their internal use. However, training these multi-billion parameter models is not only expensive but also faces challenges like catastrophic forgetting and complex performance evaluation. This thesis focuses on fine-tuning small-scale open LLMs for personalized Question Answering (QA) applications. We address the mentioned training challenges by selecting smaller models and implementing Parameter-Efficient Fine-Tuning (PEFT) techniques, concentrating on knowledge injection and instruction-tuning tasks. Our novel approach, inspired from Computer Vision (CV) research, involves training and merging multiple Low-Rank Adaptation (LoRA) adapters to inject different abilities into the pretrained models. When considering Mistral-7B, our fine-tuned version is 600% better than the original base model in Closed Book (CB) QA tasks and 200% better in Open Book (OB), as evaluated by BLUE score. Also, in a CB setting, our model achieves a 3.2% higher score than a ChatGPT-based Retrieval Augmented Generation (RAG) pipeline. This improvement is measured by a quality assessment in which ChatGPT acts as a judge for the responses. Therefore, this work demonstrates the potential of smaller, efficiently-tuned LLMs in personalized QA applications.

Large Language Models (LLMs) have taken the world by storm, creating a competitive market and compelling even established tech giants to join this rapidly evolving landscape. In the commercial sphere, well-known models like OpenAI’s ChatGPT dominate. Open-source models are either small and not very capable, or large and complex to operate at scale. Increasingly, there’s a demand for personalized LLMs, especially from companies seeking to customize models for their internal use. However, training these multi-billion parameter models is not only expensive but also faces challenges like catastrophic forgetting and complex performance evaluation. This thesis focuses on fine-tuning small-scale open LLMs for personalized Question Answering (QA) applications. We address the mentioned training challenges by selecting smaller models and implementing Parameter-Efficient Fine-Tuning (PEFT) techniques, concentrating on knowledge injection and instruction-tuning tasks. Our novel approach, inspired from Computer Vision (CV) research, involves training and merging multiple Low-Rank Adaptation (LoRA) adapters to inject different abilities into the pretrained models. When considering Mistral-7B, our fine-tuned version is 600% better than the original base model in Closed Book (CB) QA tasks and 200% better in Open Book (OB), as evaluated by BLUE score. Also, in a CB setting, our model achieves a 3.2% higher score than a ChatGPT-based Retrieval Augmented Generation (RAG) pipeline. This improvement is measured by a quality assessment in which ChatGPT acts as a judge for the responses. Therefore, this work demonstrates the potential of smaller, efficiently-tuned LLMs in personalized QA applications.

Fine-Tuning of Large Language Models for Knowledge Injection and Question-Answering

ZERBINATI, ALBERTO
2022/2023

Abstract

Large Language Models (LLMs) have taken the world by storm, creating a competitive market and compelling even established tech giants to join this rapidly evolving landscape. In the commercial sphere, well-known models like OpenAI’s ChatGPT dominate. Open-source models are either small and not very capable, or large and complex to operate at scale. Increasingly, there’s a demand for personalized LLMs, especially from companies seeking to customize models for their internal use. However, training these multi-billion parameter models is not only expensive but also faces challenges like catastrophic forgetting and complex performance evaluation. This thesis focuses on fine-tuning small-scale open LLMs for personalized Question Answering (QA) applications. We address the mentioned training challenges by selecting smaller models and implementing Parameter-Efficient Fine-Tuning (PEFT) techniques, concentrating on knowledge injection and instruction-tuning tasks. Our novel approach, inspired from Computer Vision (CV) research, involves training and merging multiple Low-Rank Adaptation (LoRA) adapters to inject different abilities into the pretrained models. When considering Mistral-7B, our fine-tuned version is 600% better than the original base model in Closed Book (CB) QA tasks and 200% better in Open Book (OB), as evaluated by BLUE score. Also, in a CB setting, our model achieves a 3.2% higher score than a ChatGPT-based Retrieval Augmented Generation (RAG) pipeline. This improvement is measured by a quality assessment in which ChatGPT acts as a judge for the responses. Therefore, this work demonstrates the potential of smaller, efficiently-tuned LLMs in personalized QA applications.
2022
Fine-Tuning of Large Language Models for Knowledge Injection and Question-Answering
Large Language Models (LLMs) have taken the world by storm, creating a competitive market and compelling even established tech giants to join this rapidly evolving landscape. In the commercial sphere, well-known models like OpenAI’s ChatGPT dominate. Open-source models are either small and not very capable, or large and complex to operate at scale. Increasingly, there’s a demand for personalized LLMs, especially from companies seeking to customize models for their internal use. However, training these multi-billion parameter models is not only expensive but also faces challenges like catastrophic forgetting and complex performance evaluation. This thesis focuses on fine-tuning small-scale open LLMs for personalized Question Answering (QA) applications. We address the mentioned training challenges by selecting smaller models and implementing Parameter-Efficient Fine-Tuning (PEFT) techniques, concentrating on knowledge injection and instruction-tuning tasks. Our novel approach, inspired from Computer Vision (CV) research, involves training and merging multiple Low-Rank Adaptation (LoRA) adapters to inject different abilities into the pretrained models. When considering Mistral-7B, our fine-tuned version is 600% better than the original base model in Closed Book (CB) QA tasks and 200% better in Open Book (OB), as evaluated by BLUE score. Also, in a CB setting, our model achieves a 3.2% higher score than a ChatGPT-based Retrieval Augmented Generation (RAG) pipeline. This improvement is measured by a quality assessment in which ChatGPT acts as a judge for the responses. Therefore, this work demonstrates the potential of smaller, efficiently-tuned LLMs in personalized QA applications.
Large Language Model
Fine-Tuning
Knowledge Injection
Question-Answering
File in questo prodotto:
File Dimensione Formato  
Zerbinati_Alberto.pdf

accesso riservato

Dimensione 10.27 MB
Formato Adobe PDF
10.27 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/58026