Reinforcement Learning – Lost Chapters – RL in AI

Researching on the topic lead to the conclusion that I skipped one of the most important steps – assigning Reinforcement Learning (RL) bots their place in the field of Artificial Intelligence (AI). There is a lot of ground to cover, so it is now time to catch up on it.

In this blog, I properly introduce RL’s place in the field AI.

AI Tierlist
AI is a branch of computer engineering designed to enable machines to act and think like humans do. However, not all AI is equally intelligent. Therefore, three AI tiers may be outlined: AGI, ASI and ANI.

Artificial General Intelligence (AGI)
AGI describes a machine that could perform the same intellectual tasks as a human could, while also keeping the same standards as humans. This line of AI is classified as “Strong AI” or “Human Level AI”. (cf. [Allan 2018])

Intelligence has many definitions, rendering the actual capabilities of an AGI unclear. Thus, six core values were formed, which define human-level AI capabilities:

  1. The ability to reason, solve problems, use strategy and make decisions under uncertainty.
  2. Represent knowledge.
  3. The ability to plan.
  4. The capability to learn.
  5. The ability to communicate in a natural language.
  6. Integrate all of the above towards a common goal.


One popular approach to develop an AGI focuses on the human brain. Using this approach, AGI would be able to replicate limited sub-functions of the human brain, but not higher cognition such as logic and narratives. However, this approach requires a deep understanding of the human brain, which has not been achieved yet.

Another approach would be the use of an abstract reasoning model, or Vinograd model. This model is based on algorithms that go beyond learning context-dependent tasks and understand the relationship between abstractions. This model requires a controlled environment that provides different tasks to the AGI. The idea is to use various tasks to train the AGI until it could solve anything within that environment. The OpenAI Gym is an example of such a model and attempts to create a proper learning environment. (cf. [Bathia 2018])

The Coffee Test
Steve Wozniak, co-founder of Apple, developed a test that aims to confirm an AGI: “The Coffee Test”. This test is about the machine entering an average American home and figuring out how to make coffee. To solve the problem, the machine must find the coffee machine, find the coffee, add water, find a mug, and brew the coffee by pushing the proper buttons. (cf. [Allan 2018], [Shick 2010])

Artificial Superintelligence (ASI)
At this point, an ASI does not exist.

ASI describes a super-intelligent computer that possesses an intelligence surpassing that of a human. ASI represents a technological step beyond AGI. It is designed to solve problems humans cannot solve.

In concept, being classified as super-intelligent requires an AI to be able to reprogram and improve itself infinitely, so it could adapt to any given problem. (cf. [Allan 2018])

The idea of an ASI usually leads to an emotional impact. While many experts (including Stephen Hawkins and Elon Musk) react negatively towards an ASI, there is still room for positive thoughts. An ASI can work to our benefit – so far, human civilization has advanced with every technological breakthrough. (cf. [Loeffler 2019])

Artificial Narrow Intelligence (ANI)
Most of the AI-led innovations today fall under the bracket of ANI. Face recognition technology, chatbots and Smart Home assistants are all perfect examples of ANI. Typically, these types of AI are designed to accomplish one task. (cf. [Bathia 2018])

ANI enables computers to outperform humans in one very narrowly defined task. One culturally relevant example of an ANI is IBM’s Watson supercomputer, which won the American TV show “Jeopardy!”. Watson is an expert “question answering” machine that uses AI technology to mimic the cognitive capabilities of humans.

ANI is still the most common form of AI technology. Any software that uses machine learning or data mining to make decisions can be considered ANI. This line of AI is known as “Weak AI”. (cf. [Allan 2018])


ANI Models
There are several approaches and models to implement ANI into applications. This chapter briefly introduces the most important key words.

Expert Systems
Expert systems handle intellectual problems that have a solution. To solve the problem, knowledge or “know-how” is needed. Having knowledge about the problem means to understand the problem.

People who understand a problem in depth or have a lot experience in a certain area are called experts. They are able to manage certain problems while others would struggle to do so. Most experts can express their experience in the form of rules or criteria that have to be fulfilled. (cf. [Negnevitsky 2008], P. 25)

The idea behind expert systems is that the experts’ knowledge is extracted and stored in a computerized system. This system is then made available to all kinds of users to be applied in their areas. Thus, every user of expert systems can achieve an expert level. (cf. [Romem 2010], P. 8)

Expert systems can be used to solve a wide variety of problems. The following seven major classes describe the main fields that make use of expert systems.

Diagnosis and troubleshooting of all kinds of devices
In this class, expert systems are used to find plausible reasons for given faults and to suggest actions to correct these faults. This class is primarily concerned with medical diagnosis and diagnosis of engineered systems.

Planning and scheduling
In this class, complex and interacting goals are analyzed to determine a set actions that is able to fulfill these goals. The actions in the set have a temporal order when they should be done. To provide such a set, the expert system considers all given resources and constraints. Examples for this class involve airline scheduling and manufacturing process planning.

Configuration of manufactured objects
In this class, expert systems are used to find a solution using a set of elements related to a set of constraints. The process of finding such a solution is called configuration. Configuration applications are used for problems involving complex engineering design and manufacturing.

Financial decision making
This class is concerned financial services. Expert systems are used for the creation of advisory programs, the determination of insurance prices and, most commonly, foreign exchange trading.

Knowledge publishing
In this class, the expert system provides relevant knowledge to a user’s problem. Expert systems that correct grammar mistakes or give advice on tax strategies and policies are a major part of this class.

Process monitoring and control
This class is concerned with analyzing real-time data to determine anomalies, predict trends and control produced goods for their quality. Expert systems of this class can be found in steel making or oil refining industries.

Design and manufacturing
In this area, expert systems assist in the design of all kinds of devices and processes. (cf. [])

Evolutionary computation
In this approach, scientists try to model the process of evolution in order to create intelligent behavior. Basically, evolutionary computation simulates evolution on a computer to get a series of optimization algorithms which are based on simple rules. The idea of this concept is to improve the quality of a solution with iterations. The solution is optimized with each iteration, until an optimal solution is found. (cf. [Negnevitsky 2008], P. 219)

The common workflow of evolutionary computation works like the following: First, a population of individuals is created and their fitness for survival is evaluated. Then, genetic operators are applied to the population, which leads to a re-evaluation of the population’s fitness. This process is repeated until the fitness is either optimal or at least feasible. (cf. [Negnevitsky 2008], P. 254)

The most essential key elements of evolutionary computation are

  • genetic algorithms,
  • evolution strategies and
  • genetic programming.


Evolutionary algorithms can be used to find sufficient solutions to problems that humans struggle to solve.

One of these problems is timetabling. For example, schools and universities must arrange room and staff allocations corresponding to their curriculum. There are constraints or rules that must be satisfied. Each staff member can only be in one place at a time and can only teach certain classes. Rooms can only be used for lessons when they are available, and lectures must be arranged separately from other lectures so that students can attend them appropriately.

This is a combinatorial problem and classified as NP-Hard. It is therefore not feasible to search for the optimal timetable because there is a high amount of computation involved. Instead, heuristics must be used. Genetic algorithms are able to successfully find satisfactory solutions to these scheduling problems.

NASA’s Evolvable Systems Group has used evolutionary algorithms to evolve antennas which are used on satellites. When testing these antennas, they proved to be extremely well adapted to their purpose. (cf. [])

Artificial Neural Networks
Artificial neural networks (ANN) and genetic algorithms are common approaches to machine learning. They involve adaptive mechanisms a machine can use to either learn from experience, by example or by analogy. The goal of such learning mechanisms is to improve the performance of an intelligent system over a period of time.

An ANN is designed to work like a brain. It consists of interconnected processors called neurons, which are representative for biological neurons in brains. Every link between two neurons has a numerical weight, which represents the importance of each neuron input. In order to learn, an ANN adjusts these weights. So, more important information is stored longer while irrelevant information is destined to be forgotten quickly. (cf. [Negnevitsky 2008], P. 214)

In order for an ANN to properly solve a problem, it must first be trained – it must learn to act in its environment. The most common learning methods can be found in chapter 2.3.

Since an ANN is always configured for a specific application, it can be applied to areas concerned with regression or pattern recognition/data classification. Regression problems include image compression, stock market predictions and optimization processes, e.g. the traveling salesman problem. Classification problems include character recognition and medical diagnosis tasks. (cf. [https://cs.])

As the interaction between the embedded and the internet world increases, the number of interacting systems with strong connectivity also increases. The growing overall complexity of such systems lead to a paradigm shift. There is a need to enhance the view of Complex System Engineering towards System of systems (SoS) Engineering. “SoS describes the large scale integration of many independent self-contained systems to satisfy global needs or multi-system requests.” (cf. [])

An example for the importance of AI within a SoS is the self-driving car. Even on the most basic level, when the car would only need to drive forward, there is a need for a system that accelerates the car, a system that controls the engine and a system that checks the environment for incoming obstacles. An ANN could be implemented in this system to classify objects on the road, i.e. incoming obstacles.

Hybrid systems
A hybrid intelligent system combines at least two of the technologies mentioned in this chapter. This combination forms the core of soft computing, which follows the idea to be capable of learning and making decisions even when the environment is uncertain.

Each technology has its own advantages and disadvantages. For example, an expert system can handle imprecise data, yet it cannot optimize its rules by itself. Thus, a good hybrid system is a combination of certain advantages offered by AI technologies. To optimize the expert system’s rules, a neural network could be used. (cf. [Negnevitsky 2008], P. 259f)


Neural Network Learning Methods
A neural network adjusts itself – or learns – by neuronal dynamics. The two core elements of neuronal dynamics are the dynamics of the activation state and the dynamics of the synaptic weights.

The activation state represents short term memory. The corresponding neuron only knows the information needed for processing. Long term memory is represented by the synaptic weights’ encoded patterns. These patterns are adjusted after every finished processing cycle. (cf. [Yegnanarayana 2005], P. 31)

In order to use the neural network for a particular application, its synaptic weights should already be adjusted. Thus, once a network has been structured it has to be trained. Every training starts by setting the initial weights randomly. The training can be carried out either supervised, unsupervised or by reinforcement, depending on the used method.

Supervised Learning
In this approach, the inputs and desired outputs are provided. The network processes its inputs and compares the actual outputs with the desired outputs. If these do not match, weight adjustments are necessary. Then, the next processing iteration begins. This cycle is repeated until the ANN reaches a statistically desired point or accuracy.

Although the network is trained, it is possible that it never learned. It is possible that the training data, or training set, does not contain the important input from which leads to the desired output. It is also possible that the training data is not enough to learn from it.

If the network was trained successfully, yet cannot solve the problem it was designed for, all elements making up the network need to checked. This encompasses the desired inputs and outputs, the number layers, the number of neurons per layer, the connections between layers, every function used in the network and the initial weights.

After the successful and correct training, it is possible to “freeze” the weights of the network. To enable faster processing speed, the network can be turned into hardware. If the weights are not “frozen”, the network will continue learning while used.

Unsupervised Learning
In unsupervised learning, also known as adaptive training, only inputs are provided. Thus, the network does not know which outputs are desired and has to learn from the inputs to find a valid solution. (cf. [])

Reinforcement Learning
RL describes an iterative process in which the ANN learns by trial and error. Depending on the provided inputs, the network decides appropriate actions – which are evaluated and considered in the next training cycle.




Primary Literature

[Negnevitsky 2008]
Negnevitsky, Michael: Artificial intelligence. A guide to intelligent systems. Ed. 2, Harlow: Addison-Wesley: Pearson Education 2008

[Romem 2010]
Romem, Yoram: Darwin, Culture and Expert Systems. Ed: Vizureanu, Petrică (ed.): Expert Systems, Intech 2010

[Yegnanarayana 2005]
Yegnanarayana, B.: Artificial neural networks. New Delhi: Prentice Hall of India 2005


Online Sources

[Loeffler 2019]
Loeffler, John: Should we fear Superintelligence? 2019. (10/02/2019)

[Allan 2018]
Allan, Sean: The Three Tiers of AI: Automating tomorrow with AGI, ASI & ANI. 2018. (10/02/2020)

[Bathia 2018]
Bathia, Richa: AGI VS ANI & Understanding the path towards Machine Intelligence. 2018. (10/02/2020)

[Shick 2010]
Shick, Michael: Wozniak: Could a Computer Make a Cup of Coffee? 2010. (10/02/2020)

Workshop on Systems of Systems Engineering and Control. Report from the Workshop on Systems and Systems Engineering and
Control. 2013. (10/02/2020)

When are Evolutionary Algorithms useful? (10/02/2020)

The Applications of Expert Systems: (10/02/2020)

Artificial Neural Networks Technology. Training an Artificial Neural Network: (10/02/2020)

Applications of neural networks: (10/02/2020)

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert.

10 − sechs =