Reinforcement Learning from Human Feedback to Fine-Tune Vision Models

Recent advances in Reinforcement Learning from Human Feedback (RLHF) have demonstrated the effectiveness of human-guided fine-tuning in aligning large language and vision-language models with human intent. This inspires the idea that this method can be useful also to train or fine-tune smaller vision models for more specialized tasks that may have a larger applicability in industrial settings. This thesis investigates how sparse human qualitative evaluations and preference signals can be seamlessly integrated into the training and adaptation of vision models, thereby reducing dependence on costly pixel-level annotations. We propose a unified framework that embeds human interactions directly into learning loops via reinforcement learning techniques and reward modelling. Through a series of experiments, we show that models trained with human feedback adapt more rapidly and robustly to novel visual scenarios, achieving significant improvements in label efficiency without sacrificing performance. The final goal is to use RLHF methodologies to adapt vision models for novel domains, enhancing their practical deployment potential.

Reinforcement Learning from Human Feedback to Fine-Tune Vision Models

SOMPURA, MAULIK RUPESHBHAI

2024/2025

Abstract

Recent advances in Reinforcement Learning from Human Feedback (RLHF) have demonstrated the effectiveness of human-guided fine-tuning in aligning large language and vision-language models with human intent. This inspires the idea that this method can be useful also to train or fine-tune smaller vision models for more specialized tasks that may have a larger applicability in industrial settings. This thesis investigates how sparse human qualitative evaluations and preference signals can be seamlessly integrated into the training and adaptation of vision models, thereby reducing dependence on costly pixel-level annotations. We propose a unified framework that embeds human interactions directly into learning loops via reinforcement learning techniques and reward modelling. Through a series of experiments, we show that models trained with human feedback adapt more rapidly and robustly to novel visual scenarios, achieving significant improvements in label efficiency without sacrificing performance. The final goal is to use RLHF methodologies to adapt vision models for novel domains, enhancing their practical deployment potential.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Reinforcement Learning from Human Feedback to Fine-Tune Vision Models
			
	Abstract in italiano
	
				Recent advances in Reinforcement Learning from Human Feedback (RLHF) have demonstrated the effectiveness of human-guided fine-tuning in aligning large language and vision-language models with human intent. This inspires the idea that this method can be useful also to train or fine-tune smaller vision models for more specialized tasks that may have a larger applicability in industrial settings. This thesis investigates how sparse human qualitative evaluations and preference signals can be seamlessly integrated into the training and adaptation of vision models, thereby reducing dependence on costly pixel-level annotations. We propose a unified framework that embeds human interactions directly into learning loops via reinforcement learning techniques and reward modelling. Through a series of experiments, we show that models trained with human feedback adapt more rapidly and robustly to novel visual scenarios, achieving significant improvements in label efficiency without sacrificing performance. The final goal is to use RLHF methodologies to adapt vision models for novel domains, enhancing their practical deployment potential.
			
	Parola chiave
	
				Human Feedback
Vision models
Reward modelling
Label efficiency
			
	Relatore
	
				MENEGATTI, EMANUELE
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Sompura_MaulikRupeshbhai.pdf accesso aperto Dimensione 2.78 MB Formato Adobe PDF Visualizza/Apri	2.78 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/93396