Deep Reinforcement Learning methods for StarCraft II Learning Environment

Reinforcement Learning (RL) is a Machine Learning framework in which an agent learns to solve a task by trial-and-error interaction with the surrounding environment. The recent adoption of artificial neural networks in this field pushed forward the boundaries of the tasks that Reinforcement Learning algorithms are able to solve, but also introduced great challenges in terms of algorithmic stability and sample efficiency. Game environments are often used as proxies for real environments to test new algorithms, since they provide tasks that are typically challenging for humans and let the RL agents make experience much faster and at a cheaper price than if they were to make it in the real world. In this thesis state-of-the-art Deep Reinforcement Learning methods are presented and applied to solve four mini-games of the StarCraft II learning environment. StarCraft II is a real-time strategy game with large action and state space, which requires learning complex long-term strategies in order to be solved; StarCraft mini-games are auxiliary tasks of increasing difficulty that test an agent's ability to learn different dynamics of the game. A first algorithm, the Advantage Actor-Critic (A2C), is studied in depth in the CartPole environment and in a simple setting of the StarCraft environment, then is trained on four out of seven StarCraft mini-games. A second algorithm, the Importance Weighted Actor-Learner Architecture (IMPALA), is introduced and trained on the same mini-games, resulting approximately 16 times faster than the A2C and achieving far better scores on the two hardest mini-games, lower score on one and equal score on the easiest one. Both agents were trained for 5 runs for each mini-game, making use of 20 CPU cores and a GPU and up to 72 hours of computation time. The best scores from the 5 runs of IMPALA are compared with the results obtained by the DeepMind team author of the paper StarCraft II: A New Challenge for Reinforcement Learning, which report the best scores out of 100 runs that used approximately two orders of magnitude more of training steps than our runs. Our IMPALA agent surpasses the performance of the DeepMind agent in two out of the four mini-games considered and obtains slightly lower scores on the other two.

Deep Reinforcement Learning methods for StarCraft II Learning Environment

Dainese, Nicola

2020/2021

Abstract

Scheda

Scheda DC