Croatia - Flag Croatia

All prices include duty and customs fees on select shipping methods.

Please confirm your currency selection:

Free shipping on most orders over 50 € (EUR)
All payment options available

US Dollars
Free shipping on most orders over $60 (USD)
All payment options available

Bench Talk for Design Engineers

Bench Talk


Bench Talk for Design Engineers | The Official Blog of Mouser Electronics

Reinforcement Learning Is Advancing AI Applications Michael Matuschek

(Source: a-image/

Just a few years ago, technological applications that can perceive the surroundings, recognize important details—and ignore the rest—and then use those details to accomplish a task seemed like the stuff of science fiction.

However, several technologies have now become an integral part of our daily lives: intelligent voice assistants that understand and respond to the many nuances of human language, medical applications that use imaging to predict cancer more accurately than human doctors, and self-driving cars that navigate dynamic environments. They are just some of the technologies making headlines.

Reinforcement learning, one of the three branches of machine learning, is advancing many of these innovations. It enables computers to recognize important features of their environment to make optimal decisions—a skill that did not exist until recently. A more detailed look at reinforcement learning (RL), artificial neural networks (ANNs), and deep learning (DL) reveals new potentials—as well as remaining challenges—for artificial intelligence applications aiming to achieve AI on a human level.

Approaches to Machine Learning

Machine learning (ML) is a subset of AI that enables computers to learn from examples and experiences. Of the three branches of ML, supervised and unsupervised learning are perhaps the best known and are used for well-defined problems and relatively predictable.

Supervised Learning

Supervised learning (SL) approaches are used for solving problems for which annotated input data are available. The algorithms try to learn patterns and associations from these known examples to, in turn, process unknown examples. A classic example of this is image recognition, in which manually annotated images are used to train models to classify freshly captured images correctly.

Unsupervised Learning

Unsupervised learning (UL) approaches are used to infer hidden structures or relationships in non-annotated data records. These approaches can be applied without much preparation but are generally more descriptive and exploratory. They are typically used to prepare for the use of supervised approaches. A common example is identifying different customer groups in transactional data, which can later facilitate various targeted marketing campaigns.

Reinforcement Learning

Reinforcement learning (RL), ML’s third branch, enables some of the most complex and human applications making headlines today. RL is a type of machine learning in which rewards and penalties evaluate individual actions and input variables on which future actions can be planned. Rather than explicitly telling how to solve a problem, RL is based on maximizing rewards and minimizing penalties. Not limited to specific problems or environments, RL focuses on machines that make optimal decisions based on complex inputs from dynamic environments.

The basic idea of RL is to model learning in a similar way to how a human—or any sufficiently intelligent being—learns: by attempting to achieve a specific goal—connected to a reward—with the skills and tools provided, but with no clear instructions on exactly how to solve the problem. A simple example is a robot that can open and close a hand to place a ball in a box. The robot has to learn that it can grab the ball, move its arm into the correct position, and then allow it to drop. This usually involves many iterations and re-starts of the experiments. The robot only receives feedback on whether its behavior was successful or not and tries to adjust its movements until the goal is achieved.

This is in marked contrast to SL, where a good result requires many examples—such as a large, diverse collection of annotated cat images—to describe the problem in all its dimensions. This is the only way for algorithms to learn exactly which features—such as shapes or colors—are relevant to the correct decision. For the robot example, the equivalent would be to accurately and carefully describe each step of the process—such as where to move the hand, how much pressure to apply, etc. For this example with few variables, it might be possible to achieve this level of detail, but relearning would be required if the variables changed. Given a larger ball, the robot would be at a loss.

In real-world applications, the balancing of inputs, outputs, and training data becomes surprisingly complex. For example, autonomous vehicles process a large amount of sensor data almost in real-time. Overlooking nuances in the environment can have significant consequences, and a great deal is at stake. That is why reinforcement learning is the tool of choice in an environment where creating training examples or instructions is prohibitive or impossible.

Sub-genres of Reinforcement Learning

Like other branches of ML, RL has sub-genres that work together to drive innovation. In particular, feature learning (FL) enables systems to recognize differentiated details of input data. Artificial neural networks (ANNs) and deep learning (DL) provide the required framework for advanced parsing, processing, and learning, and enable the subfield of deep reinforcement learning (DRL).

Feature Learning

Feature learning—also known as representation learning—is an ML technique that enables machines to recognize characteristic and independent components of input data that often cannot be represented in algorithms. For example, in a self-driving car, surroundings are perceived by several cameras, radar, and other sensors. This means that a lot of information is available to decide on the next action, but only a fraction of it is relevant. For example, the sky’s color is usually irrelevant, while the color of a traffic light is highly relevant. The speed of a bird flying past is much less important than the speed of a pedestrian approaching the curb.

Why is the ability to represent this level of input functions so important? Data sets used for training play a key role in the accuracy of the models. The more training data, the better. In particular, the more diverse examples with clear and identifying features included in the data set, the better. In other words, the distinctive and independent features of the input data help computers bridge the gap between what they have already learned and what needs to be learned to ensure 100 percent accuracy and consistency regardless of the context. The recognition of distinctive features also helps identify any characteristics and outliers that can be ignored, which can in turn help to reduce the data volume over time significantly.

Artificial Neural Networks and Deep Learning

These highly variable applications require a robust and scalable framework. One approach that has received considerable attention, particularly in supervised learning, is deep learning. When combined with the principle of reinforcement learning, we refer to it as deep reinforcement learning.

The basic idea of artificial neural networks (ANNs) dates back to the 1960s and is loosely based on the human brain’s network-like neural structure. ANNs comprise a huge network of artificial neurons called perceptrons that receive input signals, evaluate various input features, and then relay the signal through the network until an output signal is reached.

The network is defined by the number of neurons, the strength and number of their connections, and the neurons’ activation threshold. This is the strength that the input signal must have to be passed on. ANNs have a scalable structure containing multiple input and output levels, using hidden levels in between that translate the input into something that the output level can use. The specialized term deep learning is derived from networks with many successive layers of neurons and are therefore deep.

ANNs are particularly suitable for generating optimal answers from complex input data and dynamic environments because of how they learn—through backpropagation. For any given training signal—for example, a vector describing coordinates and color values of an image—the network checks that the generated output is correct and then adjusts the weights in the network slightly to achieve the desired result. After enough training iterations, the network becomes stable and can now recognize previously unknown situations.

Limitations of ANNs, DL, and RL

ANNs and DL offer enormous potential because of their ability to represent characteristics and arrive at optimal responses in dynamic environments. However, their abilities point to more challenges and reveal some remaining gaps in mimicking certain aspects of human intelligence.

Millions of Nodes, Connections, and Training Iterations Are Required

Modeling relevant problems requires ANNs to have significant numbers of nodes and connections to handle the millions of different variables that need to be analyzed and stored. Modern computers have only recently made this possible. Similarly, the number of training loops required can reach billions and grow exponentially with the number of environment variables. It is no coincidence that the first major breakthroughs in reinforcement learning were made in games such as Go, where the AI called AlphaGo now manages to beat the best human players: The rules of the game—such as the possible actions and outcomes—as well as the objectives are clearly defined, and it is easy to quickly execute many simulated games by letting the AI play against itself. The next evolutionary step was in playing video games such as Super Mario™ or StarCraft, in which the relationship between actions and outcomes is more complex. Still, the environment remains limited, and the rapid simulation of many iterations is also possible.

However, with a real-world problem such as autonomous driving, the situation is different. The primary task of safely reaching the destination is still relatively easy to formulate. However, the environment is significantly more diverse, and simulations need to be much more sophisticated to make them useful for learning about the actual problem. Ultimately, the simulations still have to be replaced with actual driving to take into account other factors that cannot be modeled, and close monitoring will continue to be required until human performance is achieved. For example, autonomous vehicle manufacturer Waymo stated in a 2020 press release that its cars need 1,400 years of driving experience to compete with human drivers. This is surprising because a person can drive a car safely after just a few weeks of practice. Why isn't this possible for R—or is it?

Abilities Related to Abstraction and Inference

People can learn to play a game or drive a car quickly because the human brain can learn through abstraction and inference. Through this type of learning, a driver can, for example, imagine what a traffic light would look like from another point of view or in another context because of the innate spatial awareness of humans. A human can also spot cars on the road that are different in color from those previously seen and draw conclusions from observations and experience.

Such functions have only been recently explored in ANNs. Although different levels of the network can capture different aspects of the input, such as shapes and colors, the network can only process characteristics explicitly contained in the training data. If the AI is trained during the daytime, the model is unlikely to handle the other conditions at night. Even with DL, such differences must be taken into account in the training data, and the degree of acceptable deviation from the training data is very small.

Various techniques for learning by abstraction and inference are currently being explored, but they reveal even more challenges and limitations. A popular example of ANN failure was a computer vision system that detected Siberian Husky dogs with extremely high reliability—much more reliably than other dog breeds. Closer inspection revealed that the network had focused on the snow present in almost all of the Husky images and was ignoring the dog itself. In other words, the model failed to see that the color of the ground—a trivial detail to humans—is not an intrinsic property of the dog.

This example seems banal and artificial, but the real-world consequences can be dire. Let's look again at the example of self-driving cars, where accidents are rare but can be traced back to ambiguous situations. The 2018 accidental death of a pedestrian pushing a bicycle down a four-lane highway was an example of a situation that would have been easy for a human driver to handle but resulted in collision and death because an ANN incorrectly processed it. The situation was not observed during the many hours of training and, no adequate failover—“If you don't know what to do, stop!”—was implemented. As a result, the system appeared to react irrationally because it lacked that fundamental cornerstone of human intelligence.

Worse still, such blind spots in AI can be exploited by those seeking to harm. For example, the image classification can be completely misdirected if manipulated images are inserted during training. Although minor changes in images are imperceptible to humans, the same changes could be perceived and interpreted differently in ANNs. In one example, stop signs with nondescript stickers were incorrectly recognized as other signs. This could have led to accidents if this trained model had been used in an actual car. A human driver, on the other hand, would, of course, still recognize the stop sign without any problems.

Overcoming Obstacles and Limitations

These and other obstacles and limitations give rise to the question of how to move forward and enable ANNs to further fill the gap in making optimal decisions. The simple answer is more training. If the variability and quality of the training data are good enough, it can reduce the error rate to such an extent that the model’s accuracy is acceptable. It has been shown that autonomous cars are already less frequently involved in accidents than human drivers, but the potential for “freak accidents” prevents wider acceptance.

Another systematic approach would be to explicitly encode the required background knowledge and make it available in the ML process. For example, a knowledge base created by Cycorp has been around for many years and contains millions of concepts and relationships, including the meaning of the stop sign mentioned previously. The aim is to manually encode human knowledge in a machine-readable form so that AI can fall back on training data and draw conclusions and assess unknown situations, at least in part, in a way that is similar to human intuition.


Technologies that can perceive the surroundings recognize important details, and make optimal decisions are no longer science fiction. Reinforcement learning, one of machine learning’s three branches, provides tools and frameworks that can handle high-dimensional variables and dynamic environments. However, these solutions also lead to new challenges, particularly the need for extensive neural networks, comprehensive training, and the imitation of human learning abilities through abstraction and inference to adapt to new situations. Although AI is capable of remarkable achievements and is becoming increasingly indispensable in many real-world applications, it is still a long way from achieving human-level learning abilities. Experiencing the intermediate steps is perhaps even more interesting than science fiction itself.

« Back

Michael Matuschek is a Senior Data Scientist form Düsseldorf, Germany. He holds a Master’s Degree in Computer Science and a PhD in Computational Linguistics. He has worked on diverse Natural Language Processing projects across different industries as well as academia. Covered topics include Sentiment Analysis for reviews, client email classification, and ontology enrichment. 

All Authors

Show More Show More
View Blogs by Date