Reinforcement Learning – Lua script analysis, Part 2

‚mario-neat.lua‘ is the core of MarI/O, handling everything.

All other scripts act as helper scripts. Their sole purpose is to provide functions to this script (and to keep the project structured).

This blog is a breakdown of the contents of the mario-neat script; it takes a closer look on its functions and functionality.

Also, a conclusion of how the behavior of the scripts could be changed will be given.

The functions of the mario-neat script include…

  • “newInnovation”: adds an ‘innovation’ to a gene or link, which underlines a connection between two genes or links. The (updated) number of innovations is stored in a pool object, which acts as a central database.
  • “newPool”: initializes a sort of database that keeps track of generated species that should develop over time, the current generation, the current number of innovations, the current species in the training process, the current genome, the current frame in the game and the maximum fitness value (the best score a species has achieved)
  • “newSpecies”: initializes a test subject that tries to top the current maximum fitness. It is initialized with a fitness value that is updated over time, a staleness that indicates how successful the species performs (the lower, the better), a set of genomes and an average fitness variable that is used for statistical purposes.
  • “newGenome”: initializes a genome that acts like a neural network. It has a set of genes, which influence the genomes’ behavior, variables for fitness that indicate the usefulness of the genome, a network of neurons (with an associated number of amount of neurons), a ranking and a set of mutation rates that are configured in the config script.
  • “copyGenome”: initializes a new genome using the newGenome function and then takes a given genome and copies its current values into the newly initialized one.
  • “basicGenome”: initializes a new genome and mutates it using a mutate function.
  • “newGene”: initializes a gene that works like an artificial neuron. It has a number of in- and outputs, a weight, a bool that indicates whether the gene is enabled or not and an innovation.
  • “copyGene”: initializes a new gene using the newGene function and copies the properties of a given gene object into the newly created one.
  • “newNeuron”: initializes a new neuron that makes up a network, which is used by genomes. A neuron has a list of incoming signals and a simple variable named value.
  • “generateNetwork”: initializes a neural network. The network has a list of neurons. The number of neurons depends on the possible inputs (input layer), the possible outputs (output layer) and the total number of genes of all genomes (hidden layer). The neurons are also initialized properly, with respective links between them.
  • “evaluateNetwork”: evaluates/recognized the outputs of the network based on the values of all neurons. The values are modified with each incoming signal; the impact of the modification is based on the weight of the incoming signal and the neurons current value, which are summarized to a sum. The new value of the neuron is the sigmoid value of said sum. Is this value greater than 0, there is an output.
  • “crossover”: takes 2 genomes and combine their properties to one.
  • “randomNeuron”: returns a random neuron from a neural network.
  • “containsLink”: checks whether a given list of genes contains a certain link.
  • “pointMutate”: might change the weights of a given genomes genes randomly. It depends on a configured ‘PerturbChance’ value, which is set in the config file.
  • “linkMutate”: takes 2 random neurons of a given genome by calling the randomNeuron function, swaps them if the second of them is an input neuron, creates a new link between the random neurons and mutates its weight. The function may also mutate biases if a given bool is set to true.
  • “nodeMutate”: takes a gene from a given genome, disables that gene, copies it twice and inserts the two new genes into the given genome. The genome ends up with one more gene.
  • “enableDisableMutate”: en- or disables all genes in a given genome.
  • “mutate”: randomly updates a given genomes’ mutation rates, mutates connections in the genome by calling the pointMutate function, mutates links and biases in the genome by calling the linkMutate function, mutates nodes of the genome by calling the nodeMutate function and might en- or disable mutations of the genome by calling the enableDisableMutate function.
  • “disjoint”: determines the number of ‘disjointed’ genes of two given genomes. A gene is ‘disjointed’ when its innovation value is set to false.
  • “weights”: calculates an average shared weights value of two given genomes.
  • “sameSpecies”: uses the disjoint and weights functions to calculate disjoint- and weight-delta values, which indicate the similarity between to given genomes. If the sum of both delta values is lower than a configured threshold, the genomes are seen as the same and the function returns ‘true’.
  • “rankGlobally”: ranks all genomes by their fitness values and updates their global rank property.
  • “calculateAverageFitness”: updates the average fitness value of a given species based on its rankings.
  • “totalAverageFitness”: returns the sum of all average fitness values of all species in the current pool.
  • “cullSpecies”: removes genomes in a species based on their fitness. Might also remove all genomes but the best.
  • “breedChild”: takes two genomes from a given species and applies a crossover function to them. Also, the mutate function is applied to the resulting child.
  • “removeStaleSpecies”: marks a species that performs best as a survivor. If it does not perform well, the function increases its staleness value. If the species is not a survivor or its staleness is too high, it is removed.
  • “removeWeakSpecies”: removes all species that have a fitness value that is too low.
  • “addToSpecies”: takes a child and adds it to the species it belongs, using the sameSpecies function. If no species can be found, the function creates a new one.
  • “newGeneration”: removes all weak and stale species using the cullSpecies, rankGlobally, removeStaleSpecies and removeWeakSpecies functions. After removal, the breedChild function is used to generate new (and hopefully even better) species. The cullSpecies function is called again, removing all genomes but the fittest of each species. The breedChild function is called again with the remaining species, and added to their corresponding species using the addToSpecies function. A generation has been born – thus, the generation counter is incremented and the generation file is saved to the pool.
  • “initializePool”: creates a number of genomes using the basicGenome function, and adds them to a species using the addToSpecies function (which will create a new species). Finally, the initializeRun function is called.
  • “initializeRun”: loads the latest savestate if available, gives a powerup when the game starts, configures game data (such as timeouts, starting amount of coins, hitcounter), initializes species and genomes, generates a neural network using the generateNetwork function and calls the evaluateCurrent function.
  • “evaluateCurrent”: loads inputs and input deltas from the game script, uses those values to evaluate the network using the evaluateNetwork function and stores the result in a controller variable. This variable is a map that indicates whether a button is pressed or not. Interactions that would cancel each other out are disabled by setting them to false if they occurred. The controller variable is used to set the joypad, which is used for giving inputs.
  • “nextGenome”: increments the currentGenome variable in the pool. If the variable surpasses the number of total genomes, the variable is reset and a new generation begins.
  • “fitnessAlreadyMeasured”: checks if any genome has a fitness of 0.
  • “displayGenome”: updates the display of the networks point of view on screen. The data shown in the resulting box is based on the current values of the neurons in the neural network.
  • “writeFile”: saves the most important data to a given filename.
  • “savePool”: loads a filename and calls the writeFile function.
  • “mysplit”: returns a certain part of a given string. Used to efficiently initialize arrays when loading a file; the filename contains the names of arrays.
  • “loadFile”: opens a saved file from a given filename and saves its contents into a new variable. The contents are used to initialize a run that is based on the saved data.
  • “flipState”: pauses and un-pauses the current run.
  • “loadPool“: loads a filename and calls the loadFile function with it.
  • “playTop”: searches for the genome with the highest fitness value, loads its properties and initializes the run.
  • “onExit”: quits the application.

 

The code itself follows the following sequence:

  • Initialize inputs and outputs based on the config file.
  • Check if there is already a pool. If not, create one using the initializePool function.
  • Initialize the interface and the network picture.
  • Create labels for the UI and initialize the sprite list.
  • Iterate until the user quits the application:
    • Load species and genomes.
    • Display genomes.
    • Call the evaluateCurrent function every 5 frames to check the current input.
    • Set the joypad.
    • Get the current positions in the game using the getPositions function (game script).
      • If the player is moving, update the movement indicator.
    • If the player is vulnerable, check if the player had a collision.
      • If the player took damage, update the hit-counter.
    • Check if the player got a power-up.
      • If so, update the power-up counter and the save the previous power-up.
    • Update current remaining lives.
    • Reduce the timeout.
    • If timeout or level clearance…
      • Calculate fitness gains.
      • Calculate fitness losses.
      • Calculate fitness.
      • Drastically Increase fitness if the level was beaten.
      • Save fitness to genome.
      • If the fitness is better than ever, set it as the new maximum fitness and save.
      • Initialize the next run.
    • Update the UI and tell the emulator to go to the next frame.

 

As we can see, most functionality is totally dedicated to the game Super Mario World, making it difficult to copy-paste it for other applications. However, it might be possible to implement the functionality of the neural networks and the genomes in scripts for other applications.

The best place to modify the script is the config file, which is not a big surprise. I will take a closer look into modifications next week.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

eins × 2 =