Reinforcement Learning – Topic Overview

This blog is a dedicated summary of this topic’s journey.

What is ‚Reinforcement Learning‘?

A developer of a learning Artificial Intelligence (AI) needs to choose one of three learning algorithms that the AI will use. ‚Reinforcement Learning‘ (RL) is one of these three; and focuses on learning by trial and error. The idea is that an AI knows about the desired outcome from the beginning, but it needs to learn some method to actually produce it.

The first steps: Research and Analyses

The book „Reinforcement Learning‘ by Richard Sutton as well as a few MIT journal snippets led to the first conclusions about how a RL AI works at a fundamental level. Since these sources are a bit outdated, the scene shifted from theory into prcatical theory – the topic now revolved around MarI/O.

The MarI/O project developed by SethBling is about an RL AI learning to play Super Mario World. At its core, the AI follows the ‚Neuro-Evolution of Augmented Topologies‘ (NEAT) algorithm, which is about evolution – learning from the successes of past generations. In short, my analysis included:

Research on some core concepts of AI (Artificial Neural Networks, Neuro-Evolution)
Research on the NEAT concept
Exercises with the Lua scripting language
Anaylses of all scripts of the MarI/O projects

TensorFlow experiments

After the first practical steps, I took an in-depth look on the most modern approaches to RL. After research of recent MIT journals and experiments with the TensorFlow framework I concluded that ‚Asynchronous Advantage Actor-Critic‘ (3AC) Methods are the most optimized and efficient way of building RL AI.

In short, practical activities included:

Research on modern approaches (Deep Q-Learning, Epsilon-greedy policies, 2AC & 3AC methods)
Comparison of software development environments
„The mountain car game“ experiment
„The cart pole game“ experiment

Metroid Learning

I decided to develop an AI that learns to play Super Metroid. The main goal of this project was to see how an RL AI would deal with a high level of complexity. Unlike Super Mario World, Super Metroid does not have a flag pole to the right of the screen that indicates the end of the level – and therefore requires the player to perform some rather difficult manoeuvres.

Based on the MarI/O project, Metroid Learning works using Lua scripts that apply the NEAT algorithm to an iterative process, which is looped by an emulator. With each cycle, the AI optimizes its inputs while trying to reach a goal that is set in a separated algorithm. After thousands of cycles, the AI eventually draws closer to its desired goal.

Simply put, this project’s activities included:

Research on pointer tables and ROM hacking
Reverse Engineering of SNES ROM files
Extracting data with different ROM hacking tools
Restructuring existing code
Re-writing and optimizing all Lua scripts
Configuration experiments

Current station: Unity

Seeing RL AI play emulated retro games makes one wonder if such AI could also be applied to modern Unity games – and so, I decided to take a closer look in this field.

Plans:

How to: Extracting data from / Hacking Unity games
A small series of experimental mini-projects

Similar Posts

Schreibe einen Kommentar Antworten abbrechen