Reinforcement Learning – MarI/O

MarI/O, the inspiration of this project, was installed and tested.

The MarI/O AI is a collection of scripts, which contain the logic of the AI, and states, which save the progress of the AI. Using the Lua scripting language, it is possible to implement the AI into the so-called BizHawk emulator, which is able to emulate multiple platforms while also allowing for scripting with Lua. BizHawk is typically known in the speedrunning scene, where people try to write scripts that play a game perfect to the frame, also known as TAS (tool-assisted speedrun).

The training process of the AI is simple: the AI tries to get as far as possible by giving input to the emulator. This is done with a so-called fitness-value, which is incremented when the AI is moving without hitting anything and when the AI collects coins or scores otherwise. The higher this value gets, the better the AI performed.

Observations during the training process lead to the conclusion that the AI does in fact learn from its mistakes. However, the process is very slow at the beginning. It takes several hundreds of attempts until the AI reacts to an obstacle. Yet, the fact that there is a reaction in the first place is a sign of success.

MarI/O training phase

 

Combining this weeks’ observations with last weeks’ research, we can see that the basic concept from Sutton does apply here. MarI/O follows a policy – it is limited to a certain amount of input buttons and has to adapt to the nature of the game. The fitness-value represents the reward function, since it is the value which should be as high as possible. Every action the AI takes has a value and influences the reward. The goal of the AI is not just to beat a level, but to do so with a high score.

Behind the scenes of the program lies the NEAT concept (NeuroEvolution of Augmenting Topologies), on which I will take a closer look during the upcoming week.

 

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

fünfzehn − 3 =