Reinforcement Learning – Lost Chapters – Hyperparameters

This blog is about hyperparamaters and some tuning methods for them.

What are Hyperparameters?

Hyperparameters represent pre-configured settings for important values that are used during the AI’s training phase. Even minor changes to these hyperparameters will influence the overall performance of the AI. The most common hyperparameters are the learning rate, number of layers within the neural network or function-related probabilities. (cf. [Goodfellow 2016])

Choosing the right Hyperparameter values

In a Proximal Policy Optimization (PPO) algorithm, the hyperparameters include a clipping rate, a value function coefﬁcient, an entropy bonus coefﬁcient, a discounting factor, a generalized advantage estimation parameter, a number of parallel workers and a number of time steps the workers should run before updating the policy. (cf. [Schulman 2017])

In a Neuro-Evolution of Augmented Topology (NEAT) algorithm, the hyperparameters include the population size, delta values for the neural network, the amount of stale species, the probabilties for mutations, perturbations, crossovers and mutations within the neural network, the number of time steps the algorithm should go through before updating its network and timout value that ends the current iteration.

Hyperparameters usually affect the behaviour of the algorithm and how it converges. Setting the hyperparameters manually requires a deep knowledge in the field on the one hand and a lot of time for fine-tuning on the other. Alternatively, the hyperparameter search may be conducted in an automated way. There are algorithms that can be used to determine an appropriate combination of hyperparameters – like Grid Search and Random Search.

Grid Search

Grid search is a structured way of testing hyperparameter combinations and is often used when there are only three or less hyperparameters. A small set of values is selected for each hyperparameter, and evenly spaced along an axis, the form a grid, as shown in Figure 1. To find the best value combination, a model is trained and evaluated for every combination of hyperparameters. (cf. [Goodfellow 2016], [Stalfort 2019])

Random Search

In the random search approach, the hyperparameters are randomly sampled within predeﬁned ranges. James Bergstra and Yoshua Bengio’s research show that some hyperparameters are often more sensitive to change than others, which leads to increased evaluation needs. If the the sensitivity and impact of each hyperparameter is known, it is easy to set up well-suited grids for conducting a grid search. However, there is often a lack of understanding regarding the effect of the hyperparameters, which makes it difficult to design a good grid search grid. (cf. [Bergstra 2012])

Sources

[Stalfort 2019]
Stalfort, Jack: Hyperparameter tuning using Grid Search and Random Search: A Conceptual Guide. Grid Search and Random Search. 2019. https://medium.com/@jackstalfort/hyperparameter-tuning-using-grid-search-and-random-search-f8750a464b35 (20/01/2020)

[Schulman 2017]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. “Proximal Policy Optimization Algorithms”. In: CoRR abs/1707.06347 (2017). http://arxiv.org/abs/1707.06347 (20/01/2020)

[Goodfellow 2016]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

[Bergstra 2012]
James Bergstra and Yoshua Bengio. “Random search for hyper-parameter optimization”. In: Journal of Machine Learning Research 13.Feb (2012), pp. 281–305.

Similar Posts

Schreibe einen Kommentar Antworten abbrechen