Towards efficient exploration in Reinforcement Learning via non-reversible stochastic processes

One of the key questions in Reinforcement Learning is the compromise between exploration and exploitation. Some popular techniques to encourage exploration are the use of random noise and adding an entropy objective to the reward function. These approaches have proven useful, but suboptimal and/or costly. One key observation is that acting randomnly is not the same as having random experiences, which has lead to the Maximum Diffusion approach, with promising results. In this work, we propose using non-reversible stochastic processes, particularly PDMPs, as a framework to enforce exploration not on the reward function or through noise, but through the structure of the process.

Towards efficient exploration in Reinforcement Learning via non-reversible stochastic processes

AGUADO CARRILLO DE ALBORNOZ, YAGO RUBEN

2024/2025

Abstract

One of the key questions in Reinforcement Learning is the compromise between exploration and exploitation. Some popular techniques to encourage exploration are the use of random noise and adding an entropy objective to the reward function. These approaches have proven useful, but suboptimal and/or costly. One key observation is that acting randomnly is not the same as having random experiences, which has lead to the Maximum Diffusion approach, with promising results. In this work, we propose using non-reversible stochastic processes, particularly PDMPs, as a framework to enforce exploration not on the reward function or through noise, but through the structure of the process.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				MATHEMATICS Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Towards efficient exploration in Reinforcement Learning via non-reversible stochastic processes
			
	Abstract in italiano
	
				One of the key questions in Reinforcement Learning is the compromise between exploration and exploitation. Some popular techniques to encourage exploration are the use of random noise and adding an entropy objective to the reward function. These approaches have proven useful, but suboptimal and/or costly. One key observation is that acting randomnly is not the same as having random experiences, which has lead to the Maximum Diffusion approach, with promising results. In this work, we propose using non-reversible stochastic processes, particularly PDMPs, as a framework to enforce exploration not on the reward function or through noise, but through the structure of the process.
			
	Parola chiave
	
				RL
Stocastic Process
Exploration
			
	Relatore
	
				SUSTO, GIAN ANTONIO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Aguado Carrillo de Alborno Yago Ruben_tesi.pdf accesso aperto Descrizione: Towards efficient exploration in Reinforcement Learning via non-reversible stochastic processes Dimensione 1.57 MB Formato Adobe PDF Visualizza/Apri	1.57 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/104269