One of the key questions in Reinforcement Learning is the compromise between exploration and exploitation. Some popular techniques to encourage exploration are the use of random noise and adding an entropy objective to the reward function. These approaches have proven useful, but suboptimal and/or costly. One key observation is that acting randomnly is not the same as having random experiences, which has lead to the Maximum Diffusion approach, with promising results. In this work, we propose using non-reversible stochastic processes, particularly PDMPs, as a framework to enforce exploration not on the reward function or through noise, but through the structure of the process.
One of the key questions in Reinforcement Learning is the compromise between exploration and exploitation. Some popular techniques to encourage exploration are the use of random noise and adding an entropy objective to the reward function. These approaches have proven useful, but suboptimal and/or costly. One key observation is that acting randomnly is not the same as having random experiences, which has lead to the Maximum Diffusion approach, with promising results. In this work, we propose using non-reversible stochastic processes, particularly PDMPs, as a framework to enforce exploration not on the reward function or through noise, but through the structure of the process.
Towards efficient exploration in Reinforcement Learning via non-reversible stochastic processes
AGUADO CARRILLO DE ALBORNOZ, YAGO RUBEN
2024/2025
Abstract
One of the key questions in Reinforcement Learning is the compromise between exploration and exploitation. Some popular techniques to encourage exploration are the use of random noise and adding an entropy objective to the reward function. These approaches have proven useful, but suboptimal and/or costly. One key observation is that acting randomnly is not the same as having random experiences, which has lead to the Maximum Diffusion approach, with promising results. In this work, we propose using non-reversible stochastic processes, particularly PDMPs, as a framework to enforce exploration not on the reward function or through noise, but through the structure of the process.| File | Dimensione | Formato | |
|---|---|---|---|
|
Aguado Carrillo de Alborno Yago Ruben_tesi.pdf
accesso aperto
Descrizione: Towards efficient exploration in Reinforcement Learning via non-reversible stochastic processes
Dimensione
1.57 MB
Formato
Adobe PDF
|
1.57 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/104269