Reward Functions

Reward Function is a mathematical function in reinforcement learning that defines what actions or outcomes are desirable for an AI system by assigning numerical values (rewards) to different states or state-action pairs. It essentially encodes the goals and preferences we want the AI to optimize for, though specifying appropriate reward functions that avoid unintended consequences is a significant challenge in AI development.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF