Reinforcement Learning: the untold power and mystery

Blog

Written By: Sunil Kumar Arjun Prasad

For many, the very thought of “intelligent” robots triggers flashbacks of every Sci-Fi film they’ve ever seen. Just imagine… fully autonomous machines somehow end up with minds of their own. While we see the precipice of this innovation on a smaller scale in 2021 — with companies like Tesla championing the start of self-driving cars — there is reason to believe that the real-world application for Reinforcement Learning (“RL”) will be a fully-integrated reality in a matter of five years.

What is RL, anyway?

RL is a specialized type of Artificial Intelligence (“AI”), that theoretically could one day replace the need for human involvement. The technology may not replace a doctor or astronaut anytime soon, but it could help in fields like data analytics.

Data analysts are the culmination of their onsite and experience-based knowledge. They’ve mastered the art of critically looking at the past and applying their learnings to resolve similar problems. This includes recommendations like changes to their data infrastructure, or improvements in the accuracy of their customer’s dashboards, to mention just a couple of examples.

Is it in fact, possible, that one day data analysts could successfully delegate these responsibilities to RL-inclined robots, on their behalf? Let’s discuss how it works…

Say you’re trying to teach a dog how to sit, for example. You raise your hand, and you ask for a sit. Each time he successfully sits or accidently performs an action like it, you offer him a treat. Soon the dog begins to learn that there are different paths to a similar outcome. From there, you teach the pup more tricks. As time continues, it begins to learn that there are different paths to take that may result in a reward. RL functions under a similar methodology.

The dog or “agent” performs certain tasks that a human or animal is capable of. Every agent tackles some degree of “Environment,” where their actions address a problem statement or business objective that needs to be solved. In the case of the dog, the dog aka agent generates possible solutions for how to sit/solve the problem to get the “reward,” or confirmation that the agent has successfully completed the task. The function of the dog is RL, in a nutshell.

In a more technical sense, RL is when an agent has a free environment to meet an objective that needs to be achieved. Whatever it takes to achieve that task, is given a reward. “Reward” is a function activated every time the agent gets it right. Overtime, the agent begins to learn — with more precision — how to get that reward. It doesn’t need to be explicitly stated, as it intuitively seeks out the reward.

Every RL experiment begins with a problem that has a finite (or infinite) number of solutions. Say the objective for an agent is to reach point B from point A. It is the responsibility of the agent to explore the different routes to resolve the problem. Overtime, its accumulated learnings help achieve that objective.

Each path offers different pros and cons. Imagine we were talking about someone driving home after work. If I get off at exit 2, I will get home quicker but if I get off at exit 1, I can pick up dinner on the way home which will save me time and stress in the long haul. With this is an oversimplification of the various applications of RL, it highlights the critical thinking these agents exercise when evaluating which choices may result in the most impactful outcomes.

How do we even do it? How does it work?

RL is achieved mathematically.

A Q Learning Model is very similar to the employee headed home after the workday. His actions are Q- NET (“Quality of Action Work”) functions are the costs associated between the different steps taken to achieve that goal, and overtime result in an agent successfully able to evaluate task-action performance.

Easy as pie, right? Wrong. The Q-unit function is helpful in situations with a small number of possible solutions. It becomes problematic when there are higher stakes, and an increased number of solutions.

It’s the difference between picking which of the five colors is your favorite versus playing the video game Grand Theft Auto. In Grand Theft Auto, there are a vast number of possibilities a character can play out at every point during the game. Q-NET wouldn’t be helpful in the context of a Grand Theft Auto scenario, in identifying a solution with millions of possible applications, as it would be too long and inefficient.

The DQ-NET Work — or “Deep Q- NET Work” — would be a more compatible variant. While it has its own pros and cons, it can find the best optimal state (using a Neural Network instead of picking random) for an agent to tackle a problem of this scale and higher state of environment. A Double Deep Q-NET aka “DDQ-NET” is the similar idea but for even more complex variants within the network, itself. And the list goes on with new variants getting published every other day in RL research.

Deciding which RL algorithm to use will continue to be a lofty goal until we get to an advancement of a single general purpose RL algorithm that can be applied to all kinds of environments (agent’s objective).

RL Application in a Technical and Business Environment

At first introduction, it may seem difficult to imagine specific scenarios where RL may be valuable outside of a corporate environment. Little did you know, RL has been present all along.

Heard of face recognition? Take Apple, Google Photos, or any one of their competitors… have you noticed that your smart device automatically identifies who the person is and thereafter segments photos under a person’s name? It does this by leveraging all your images to identify similarities person-to person. With each additional photo, the algorithm becomes more accurate.

This technology is still a work in progress. Deep learning algorithms still need a lot of training with data to work out these kinks. While a person who meets someone once can recall their face a month later, the same is not true for the face recognition technology. Unlike humans who use one shot/few shot recognition, they are not equipped to handle the same complexity to model that level of accuracy with images for a deep network. RL has shown remarkable improvement on the area of one shot/few shot learning.

Other examples of RL in the workplace include disciplines like logistics and stock trading. Fintech companies, specifically, have begun to employ the use of RL to automate the stock trading over a day’s period. This algorithmic trading is RL and used heavily, for things like portfolio management.

We at Systech have been focusing on developing autonomous Data and Business Analysts with the help of RL. Experimenting with a rather tamed agent on industry specific environment definition is the key.

The Future of RL on a Global Scale

In business, it’s often said that you should work smarter not harder. While RL is not completely integrated into the business world yet, current experiments with the technology are the advent of advancements in RL in a meaningful way. Thus, it’s only a matter of time before the lines between “human” and “machine” become blurred.

The Systech Solutions, Inc. Blog Series is designed to showcase ongoing innovations in the data and analytics space. If you have any suggestions for an upcoming article, or would like to volunteer to be interviewed, please contact Olivia Klayman at oliviak@systechusa.com.

Blog

Recent Blogs

Why Indian Manufacturing Needs OptiBlend™

Precision Oil Blending Optimization: A Technical Deep Dive

Leading with Data – Microsoft Fabric for Modern IT Leaders