AI ALIGNMENT FORUM
AF

HomeLibraryQuestionsAll Posts
About

Top Questions

42why assume AGIs will optimize for fixed goals?
Q
nostalgebraist, Rob Bensinger
3y
Q
3
27What convincing warning shot could help prevent extinction from AI?
Q
Charbel-Raphael Segerie, Diego Dorn, Peter Barnett
1y
Q
2
51Have LLMs Generated Novel Insights?
Q
Abram Demski, Cole Wyeth, Kaj Sotala
3mo
Q
19
40Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Alex Turner, johnswentworth
3y
Q
13
68Why is o1 so deceptive?
Q
Abram Demski, Sahil
8mo
Q
14
Load MoreView All Top Questions

Recent Activity

42why assume AGIs will optimize for fixed goals?
Q
nostalgebraist, Rob Bensinger
3y
Q
3
27What convincing warning shot could help prevent extinction from AI?
Q
Charbel-Raphael Segerie, Diego Dorn, Peter Barnett
1y
Q
2
7Egan's Theorem?
Q
johnswentworth
5y
Q
7
51Have LLMs Generated Novel Insights?
Q
Abram Demski, Cole Wyeth, Kaj Sotala
3mo
Q
19
40Seriously, what goes wrong with "reward the agent when it makes you smile"?
Q
Alex Turner, johnswentworth
3y
Q
13
14Is weak-to-strong generalization an alignment technique?
Q
cloud
3mo
Q
1
9What is the most impressive game LLMs can play well?
Q
Cole Wyeth
4mo
Q
8
4How counterfactual are logical counterfactuals?
Q
Donald Hobson
5mo
Q
9
16Are You More Real If You're Really Forgetful?
Q
Thane Ruthenis, Charlie Steiner
6mo
Q
4
6Why not tool AI?
Q
smithee, Ben Pace
6y
Q
2
68Why is o1 so deceptive?
Q
Abram Demski, Sahil
8mo
Q
14
7Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception?
Q
David Scott Krueger
8mo
Q
5
Load MoreView All Questions