Reinforcement learning

Within the field of ~~Machine Learning,~~machine learning, reinforcement learning refers to the study of how ~~an agent should choose its actions within an environment in order~~ to ~~maximize some kind of reward. Strongly inspired~~train agents to complete tasks by updating the ~~work developed in behavioral psychology it is essentially a trial and error approach to find the best strategy.~~agents with feedback signals.

Consider an agent that receives an input informing the agent of the environment's state. Based only on that information, the agent has to make a decision regarding which action to take, from a set, which will influence the state of the environment. This action will in itself change the state of the environment, which will result in a new input, and so on, each time also presenting the agent with the reward (or reinforcement signal) relative to its actions in the environment. ~~The agent's goal~~In "policy gradient" approaches, the reinforcement signal is ~~then~~often used to ~~find~~update the ~~ideal strategy which~~agent (the "policy"), although sometimes an agent will ~~give~~do limited online (model-based) heuristic search to instead optimize the ~~highest~~ reward ~~expectations over time, based on previous experience.~~signal + heuristic evaluation.

~~As a reward-maximising AI architecture,~~ RL is distinguished from energy-based architectures such as Active Inference and Joint Embedded Predictive Architectures (JEPA).

Consider an agent that receives an input ~~from a complex environment~~informing the agent of ~~which it knows nothing of, informing it of its~~the environment's state. Based only on that information, the agent has to make a decision regarding which action to take, from a set, which will influence the state of the environment. This action will in itself change the state of the environment, which will result in a new input, and so on, each time also presenting the agent with the reward relative to its actions in the environment. The agent's goal is then to find the ideal strategy which will give the highest reward expectations over time, based on previous experience.

This is the problem of exploration, which is best described in the most studied reinforcement learning problem - the k-armed bandit. In it, an agent has to decide which sequence of levers to pull in a gambling room, not having any information about the probabilities of winning in each machine besides the reward it receives each time. The problem revolves about deciding ~~whether to abandon an apparent~~which is the optimal lever ~~and, in doing so, deciding which is going to be~~and what criteria defines the ~~next one.~~lever as such.

			v1.20.0Dec 30th 2024 GMT	(+24/-70)
			v1.19.0May 29th 2023 GMT	(+16/-4)
			v1.18.0Mar 9th 2023 GMT	(+16)
			v1.17.0Mar 9th 2023 GMT	(+333/-390)
			v1.16.0Dec 26th 2022 GMT	(+9)
			v1.15.0Dec 21st 2022 GMT	(+168)
			v1.14.0Nov 26th 2021 GMT
			v1.13.0Sep 24th 2020 GMT	(+56) added see also links
			v1.12.0Sep 24th 2020 GMT	(+95/-56)
			v1.11.0Sep 17th 2012 GMT	(+87/-158)

			v1.20.0Dec 30th 2024 GMT	(+24/-70)
			v1.19.0May 29th 2023 GMT	(+16/-4)
			v1.18.0Mar 9th 2023 GMT	(+16)
			v1.17.0Mar 9th 2023 GMT	(+333/-390)
			v1.16.0Dec 26th 2022 GMT	(+9)
			v1.15.0Dec 21st 2022 GMT	(+168)
			v1.14.0Nov 26th 2021 GMT
			v1.13.0Sep 24th 2020 GMT	(+56) added see also links
			v1.12.0Sep 24th 2020 GMT	(+95/-56)
			v1.11.0Sep 17th 2012 GMT	(+87/-158)

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Reinforcement learning

See Also

See Also