Reinforcement Learning

Miranda Dixon-Luinenburg (+56) added see also links
Ruben Bloom (+95/-56)
pedrochaves (+87/-158)
pedrochaves (+477/-183)
pedrochaves
pedrochaves (+9/-9)
pedrochaves (+242/-86) /* Further Reading & References */
pedrochaves /* Further Reading & References */
pedrochaves (+146)
pedrochaves (+197) /* Exploration and Optimization */

Consider an agent that receives an input from a complex environmentinforming the agent of which it knows nothing of, informing it of itsthe environment's state. Based only on that information, the agent has to make a decision regarding which action to take, from a set, which will influence the state of the environment. This action will in itself change the state of the environment, which will result in a new input, and so on, each time also presenting the agent with the reward relative to its actions in the environment. The agent's goal is then to find the ideal strategy which will give the highest reward expectations over time, based on previous experience.

This is the problem of exploration, which is best described in the most studied reinforcement learning problem - the k-armed bandit. In it, an agent has to decide which sequence of levers to pull in a gambling room, not having any information about the probabilities of winning in each machine besides the reward it receives each time. The problem revolves about deciding whether to abandon an apparentwhich is the optimal lever and, in doing so, deciding which is going to beand what criteria defines the next one.lever as such.

Within the field of Machine Learning, reinforcement learning refers to the study of methodshow an agent should choose its actions within an environment in order to maximize some kind of magnifying the reward given by interactions with the environment with no a priori knowledge of its properties.reward. Strongly inspired by the work developed in behavioral psychology it is essentially a trial and error approach to find the best strategy.

Consider an agent that receives an input vector – Ifrom a complex environment of which it knows nothing of – Sof, informing it of its state. Based only on that information, the agent has to make a decision regarding which action to take, from a set, which will influence the state of the environment – A.environment. This action will in itself change the state of the environment, which will result in a new input vector,input, and so on, each time also presenting the agent with the reward relative to its actions in the environment – r.environment. The agent's goal is then to find the ideal strategy which will give the highest reward expectations over time, based on previous experience.

This is the problem of exploration, which is best described in the most studied reinforcement learning problem - the k-armed bandit. In it, an agent has to decide which sequence of levers to pull in a gambling room, not having any information about the probabilities of winning in each machine besides the reward it receives each time. The problem revolves about deciding whether to abandon an apparent optimal lever and, in doing so, deciding which is going to be the next one.

Within the field of Machine learning,Learning, reinforcement learning refers to the study of methods of magnifying the reward given by interactions with the environment with no a priori knowledge of its properties. Strongly inspired by the work developed in behavioral psychology it is essentially a trial and error approach to find the best strategy.

Parallel with an exploration implementation, it is still necessary to chose the criteria which makes a certain action optimal when compared to another. This study of this property has led to several methods, from brute forcing to taking into account temporal differences in the received reward. Despite this and the great results obtained by reinforcement methods in solving small problems, it suffers from a lack of scalability, having difficulties solving larger, close-to-human scenarios.

Load More (10/14)