Metroid Learning – Conclusions – Design and Research

In this blog, I briefly summarize this project, present the final test results and finish with both objective conclusions and subjective lessons learned.

Project Summary

‚Metroid Learning‘ is about an AI learning to play Super Metroid on an SNES emulator. The learning procedure is based on trial and error. The AI optmizes its inputs with each attempt, until it manages to beat a level.
The main goal of this project is to analyze how a Reinforcement Learning AI deals with high levels of complexity. Super Metroid was chosen because it requires ten buttons to be played and many different combinations to be beaten.
Achieving this goal involved scripting with Lua, datamining with dedicated ROM hacking tools and research in the field of reverse engineering.

The final state of the project can be found here: https://github.com/haringma15/metroid-learning

Test Results

Ten training sessions with different settings and algorithms were conducted over a span of three weeks.

Three main influencing factors were analyzed in these sessions: the mutation rates, which indicate the possibility of random changes within the neural network, the number of neurons the neural network consists of and the algorithm the AI learns to play by. A success could be considered if the AI managed to get through the first room in the game.

In the first six sessions, the AI was rewarded by simply moving. However, if it was only jumping in the nearest corner for too long, it would receive a high score penalty.
The first two runs were run simultaniously on two devices. Both programs had the same settings of average mutation rates and 100.000 neurons per network. Although one AI achieved higher scores quite early, the other AI managed to get to the same results. Around generation 90 both programs ran into the same problem – the computational weight put on the CPU was too much to handle and both devices froze.
Sessions 3 – 5 shared a network of 25.000 neurons, but differed in mutation rates. It was found that high mutation rates would stop to improve after 60 generations while low mutation rates would continously improve but only try to cheat the algorithm after 100 generations (which barely differed from average mutation rates).
In session 6, the network would consists of only 10.000 neurons. This lead to an improvement stop after a mere 10 generations. The AI was not able to learn any complex movement patterns. Thus, the number of neurons depends on the complexity of the given environment.
In sessions 7 – 9, I changed the algorithm. The AI would only be rewarded for moving downward. I tested this algorithm with average mutation rates and three differently sized networks. The AI with a network of 40.000 neurons made it closest to the goal of the level. A network of 60.000 neurons came very close too, but the computational weight could not be burdened.
Finally, I reduced the complexity of the inputs. Instead of ten buttons, only three would be used. Using a network of 30.000 neurons, this session yielded the best score so far – but did not manage to get to the next room.

Thus, so far, no AI was truly successful. Some performed well, though; and the results are insightful nonetheless.

Conclusions

Depending on the emulated game, the use of ROM Hacking tools might be necessary to find information about relevant graphical data. In this project, a RAM Map that provides an overview about the internal structure of the game was sufficient.
A Reinforcement Learning AI learns to act according to a given score algorithm. The AI will always try to exploit its environment to its maximum and therefore, will try to ‚cheat‘ within the boundaries of its algorithm.
High mutation rates lead to unpredictable actions. AIs with these settings perform poorly at this task.
Low mutation rates lead to long term success. AIs with these settings perform well at this task.
The number of neurons in a network should be appropriate for the complexity of the given environment. If many inputs are required to be successful, a higher amount of neurons should be set.
As the AI learns, the level of computation rises. If a network consists of too many neurons, the hardware might not be able to handle the task.

Lessons Learned

During my ROM Hacking approaches, I realized that it makes more sense to search for tools that are dedicated to the game rather than tools that are generally used for editing graphics. Specific solution > General solution.
If I started with the code earlier I might have noticed the sprite problem earlier. I solved the problem by rewriting the sprite recognition function, which resulted in an obsolete spritelist script – obsolete effort. Development and research should be combined from the start.
If I learned about RAM Maps sooner, I would have saved a lot of time. New problems should be approached with a new mindset – trying out new solutions earlier might pay off.
During development, it is very helpful to keep code structured. During debugging, errors could easily be pinpointed. However, the sequence of the code must be kept in order – too much structure might cause errors.
GitHub is useful for any type of project. Although this project was not a collaboration, I could still use GitHub as a central workspace, which helps a lot when working on multiple devices.
Since an AI always tries to find loopholes in the system, it is recommendable to implement anti-cheat systems in the code. It is important to be wary of everything that could go wrong.

Similar Posts

Schreibe einen Kommentar Antworten abbrechen