This thesis presents an investigation into the application of Reinforcement Learning (RL) techniques to the game of Ultimate Tic Tac Toe (UTTT), a complex extension of the classic Tic Tac Toe. The game’s unique combination of local and global strategic dynamics makes it an ideal testbed for evaluating the capabilities of modern RL algorithms in navigating large, structured action and state spaces. After providing a theoretical foundation in reinforcement learning, game theory, and algorithmic strategy, the study focuses on the design, training, and evaluation of several RL agents, including Deep Q-Networks (DQN), Double DQN (DDQN), Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO). To address the complexity and strategic richness of UTTT, the thesis introduces a dedicated environment and training pipeline for self-play reinforcement learning. Each algorithm has been trained in isolation and evaluated against random agents to establish a performance baseline. Subsequently, the most promising models have been further refined using advanced techniques such as residual neural networks, data augmentation and hyperparameter tuning. Performance is measured using a combination of win/draw/loss statistics, Elo rating systems and round-robin tournaments between agents. A key contribution of this work is the integration of best response training to evaluate the exploitability of learned policies. By training a separate best response agent using a DQN-based framework, the study reveals significant vulnerabilities in self-play agents and highlights the need for population-based training approaches to approximate Nash equilibria. The final DDQN-based model, enhanced with residual architectures and rotational data augmentation, achieves a win-rate of 85% against all the others agents trained, corresponding to a 72% relative improvement compared to the original developed model. This thesis not only demonstrates the effectiveness of RL in mastering a non-trivial board game from scratch but also highlights the limitations of standard self-play in producing robust strategies. By combining insights from reinforcement learning, neural network design and game theory, this research contributes a methodical and reproducible framework for developing and evaluating competitive agents in discrete, turn-based games with high strategic complexity.
This thesis presents an investigation into the application of Reinforcement Learning (RL) techniques to the game of Ultimate Tic Tac Toe (UTTT), a complex extension of the classic Tic Tac Toe. The game’s unique combination of local and global strategic dynamics makes it an ideal testbed for evaluating the capabilities of modern RL algorithms in navigating large, structured action and state spaces. After providing a theoretical foundation in reinforcement learning, game theory, and algorithmic strategy, the study focuses on the design, training, and evaluation of several RL agents, including Deep Q-Networks (DQN), Double DQN (DDQN), Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO). To address the complexity and strategic richness of UTTT, the thesis introduces a dedicated environment and training pipeline for self-play reinforcement learning. Each algorithm has been trained in isolation and evaluated against random agents to establish a performance baseline. Subsequently, the most promising models have been further refined using advanced techniques such as residual neural networks, data augmentation and hyperparameter tuning. Performance is measured using a combination of win/draw/loss statistics, Elo rating systems and round-robin tournaments between agents. A key contribution of this work is the integration of best response training to evaluate the exploitability of learned policies. By training a separate best response agent using a DQN-based framework, the study reveals significant vulnerabilities in self-play agents and highlights the need for population-based training approaches to approximate Nash equilibria. The final DDQN-based model, enhanced with residual architectures and rotational data augmentation, achieves a win-rate of 85% against all the others agents trained, corresponding to a 72% relative improvement compared to the original developed model. This thesis not only demonstrates the effectiveness of RL in mastering a non-trivial board game from scratch but also highlights the limitations of standard self-play in producing robust strategies. By combining insights from reinforcement learning, neural network design and game theory, this research contributes a methodical and reproducible framework for developing and evaluating competitive agents in discrete, turn-based games with high strategic complexity.
When Tic-Tac-Toe Wasn't Hard Enough: A Reinforcement Learning Journey in Ultimate Tic Tac Toe
D'ALBERTON, ENRICO
2024/2025
Abstract
This thesis presents an investigation into the application of Reinforcement Learning (RL) techniques to the game of Ultimate Tic Tac Toe (UTTT), a complex extension of the classic Tic Tac Toe. The game’s unique combination of local and global strategic dynamics makes it an ideal testbed for evaluating the capabilities of modern RL algorithms in navigating large, structured action and state spaces. After providing a theoretical foundation in reinforcement learning, game theory, and algorithmic strategy, the study focuses on the design, training, and evaluation of several RL agents, including Deep Q-Networks (DQN), Double DQN (DDQN), Advantage Actor-Critic (A2C), and Proximal Policy Optimization (PPO). To address the complexity and strategic richness of UTTT, the thesis introduces a dedicated environment and training pipeline for self-play reinforcement learning. Each algorithm has been trained in isolation and evaluated against random agents to establish a performance baseline. Subsequently, the most promising models have been further refined using advanced techniques such as residual neural networks, data augmentation and hyperparameter tuning. Performance is measured using a combination of win/draw/loss statistics, Elo rating systems and round-robin tournaments between agents. A key contribution of this work is the integration of best response training to evaluate the exploitability of learned policies. By training a separate best response agent using a DQN-based framework, the study reveals significant vulnerabilities in self-play agents and highlights the need for population-based training approaches to approximate Nash equilibria. The final DDQN-based model, enhanced with residual architectures and rotational data augmentation, achieves a win-rate of 85% against all the others agents trained, corresponding to a 72% relative improvement compared to the original developed model. This thesis not only demonstrates the effectiveness of RL in mastering a non-trivial board game from scratch but also highlights the limitations of standard self-play in producing robust strategies. By combining insights from reinforcement learning, neural network design and game theory, this research contributes a methodical and reproducible framework for developing and evaluating competitive agents in discrete, turn-based games with high strategic complexity.| File | Dimensione | Formato | |
|---|---|---|---|
|
D'Alberton_Enrico.pdf
accesso aperto
Dimensione
1.9 MB
Formato
Adobe PDF
|
1.9 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/86899