One of the key questions in Reinforcement Learning is the compromise between exploration and exploitation. Some popular techniques to encourage exploration are the use of random noise and adding an entropy objective to the reward function. These approaches have proven useful, but suboptimal and/or costly. One key observation is that acting randomnly is not the same as having random experiences, which has lead to the Maximum Diffusion approach, with promising results. In this work, we propose using non-reversible stochastic processes, particularly PDMPs, as a framework to enforce exploration not on the reward function or through noise, but through the structure of the process.

One of the key questions in Reinforcement Learning is the compromise between exploration and exploitation. Some popular techniques to encourage exploration are the use of random noise and adding an entropy objective to the reward function. These approaches have proven useful, but suboptimal and/or costly. One key observation is that acting randomnly is not the same as having random experiences, which has lead to the Maximum Diffusion approach, with promising results. In this work, we propose using non-reversible stochastic processes, particularly PDMPs, as a framework to enforce exploration not on the reward function or through noise, but through the structure of the process.

Towards efficient exploration in Reinforcement Learning via non-reversible stochastic processes

AGUADO CARRILLO DE ALBORNOZ, YAGO RUBEN
2024/2025

Abstract

One of the key questions in Reinforcement Learning is the compromise between exploration and exploitation. Some popular techniques to encourage exploration are the use of random noise and adding an entropy objective to the reward function. These approaches have proven useful, but suboptimal and/or costly. One key observation is that acting randomnly is not the same as having random experiences, which has lead to the Maximum Diffusion approach, with promising results. In this work, we propose using non-reversible stochastic processes, particularly PDMPs, as a framework to enforce exploration not on the reward function or through noise, but through the structure of the process.
2024
Towards efficient exploration in Reinforcement Learning via non-reversible stochastic processes
One of the key questions in Reinforcement Learning is the compromise between exploration and exploitation. Some popular techniques to encourage exploration are the use of random noise and adding an entropy objective to the reward function. These approaches have proven useful, but suboptimal and/or costly. One key observation is that acting randomnly is not the same as having random experiences, which has lead to the Maximum Diffusion approach, with promising results. In this work, we propose using non-reversible stochastic processes, particularly PDMPs, as a framework to enforce exploration not on the reward function or through noise, but through the structure of the process.
RL
Stocastic Process
Exploration
File in questo prodotto:
File Dimensione Formato  
Aguado Carrillo de Alborno Yago Ruben_tesi.pdf

accesso aperto

Descrizione: Towards efficient exploration in Reinforcement Learning via non-reversible stochastic processes
Dimensione 1.57 MB
Formato Adobe PDF
1.57 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/104269