This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Reward Functions
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Reward Functions
Random Tag
Contributors
Posts tagged
Reward Functions
Most Relevant
3
90
Reward is not the optimization target
Alex Turner
2y
88
3
28
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
,
Ramana Kumar
3y
2
0
53
Scaling Laws for Reward Model Overoptimization
leogao
,
John Schulman
,
Jacob Hilton
2y
5
2
40
Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Alex Turner
,
johnswentworth
2y
Q
13
2
20
Four usages of "loss" in AI
Alex Turner
2y
16
0
10
Language Agents Reduce the Risk of Existential Catastrophe
Cameron Domenico Kirk-Giannini
,
Simon Goldstein
1y
1
1
11
$100/$50 rewards for good references
Stuart Armstrong
2y
2
1
51
Utility ≠ Reward
Vladimir Mikulik
5y
16
0
13
Shutdown-Seeking AI
Simon Goldstein
1y
5
1
27
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
,
Quintin Pope
,
peligrietzer
1y
0
1
10
Thoughts on reward engineering
Paul Christiano
5y
19
1
6
The reward engineering problem
Paul Christiano
5y
1
0
0
Reward model hacking as a challenge for reward learning
Erik Jenner
2y
0
1
11
VLM-RM: Specifying Rewards with Natural Language
ChengCheng
,
David Lindner
,
Ethan Perez
6mo
0
1
8
Reward functions and updating assumptions can hide a multitude of sins
Stuart Armstrong
4y
2