This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Reward Functions
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (2 flags)
Reward Functions
Random Tag
Contributors
Posts tagged
Reward Functions
Most Relevant
3
89
Reward is not the optimization target
Alex Turner
1y
82
3
28
Draft papers for REALab and Decoupled Approval on tampering
Jonathan Uesato
,
Ramana Kumar
3y
2
0
53
Scaling Laws for Reward Model Overoptimization
leogao
,
John Schulman
,
Jacob Hilton
1y
5
2
39
Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Alex Turner
,
johnswentworth
1y
Q
13
2
20
Four usages of "loss" in AI
Alex Turner
1y
14
0
10
Language Agents Reduce the Risk of Existential Catastrophe
Cameron Domenico Kirk-Giannini
,
Simon Goldstein
4mo
1
1
11
$100/$50 rewards for good references
Stuart Armstrong
2y
2
0
13
Shutdown-Seeking AI
Simon Goldstein
4mo
5
1
27
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
,
Quintin Pope
,
peligrietzer
10mo
0
1
10
Thoughts on reward engineering
Paul Christiano
5y
19
1
6
The reward engineering problem
Paul Christiano
5y
1
0
0
Reward model hacking as a challenge for reward learning
Erik Jenner
1y
0
1
8
Reward functions and updating assumptions can hide a multitude of sins
Stuart Armstrong
3y
2
1
6
Probabilities, weights, sums: pretty much the same for reward functions
Stuart Armstrong
3y
0