This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Reward Functions
•
Applied to
Intrinsic Drives and Extrinsic Misuse: Two Intertwined Risks of AI
by
jacobjacob
1mo
ago
•
Applied to
VLM-RM: Specifying Rewards with Natural Language
by
ChengCheng
1mo
ago
•
Applied to
Some alignment ideas
by
SelonNerias
4mo
ago
•
Applied to
self-improvement-executors are not goal-maximizers
by
bhauth
6mo
ago
•
Applied to
Shutdown-Seeking AI
by
Simon Goldstein
6mo
ago
•
Applied to
Language Agents Reduce the Risk of Existential Catastrophe
by
Cameron Domenico Kirk-Giannini
6mo
ago
•
Applied to
A Short Dialogue on the Meaning of Reward Functions
by
Leon Lang
1y
ago
•
Applied to
Learning societal values from law as part of an AGI alignment strategy
by
John Nay
1y
ago
•
Applied to
Scaling Laws for Reward Model Overoptimization
by
David Gross
1y
ago
•
Applied to
Four usages of "loss" in AI
by
Alex Turner
1y
ago
•
Applied to
Reward IS the Optimization Target
by
RobertM
1y
ago
•
Applied to
Leveraging Legal Informatics to Align AI
by
John Nay
1y
ago
•
Applied to
An investigation into when agents may be incentivized to manipulate our beliefs.
by
RobertM
1y
ago
•
Applied to
Seriously, what goes wrong with "reward the agent when it makes you smile"?
by
Alex Turner
1y
ago
•
Applied to
Reward is not the optimization target
by
Alex Turner
1y
ago
•
Applied to
Reward model hacking as a challenge for reward learning
by
Erik Jenner
2y
ago
•
Applied to
Demanding and Designing Aligned Cognitive Architectures
by
Koen Holtman
2y
ago
•
Applied to
$100/$50 rewards for good references
by
Ruben Bloom
2y
ago