This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Academic Papers
Edit
History
Subscribe
Discussion
(0)
Help improve this page (3 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (3 flags)
Academic Papers
Random Tag
Contributors
2
Kaj Sotala
Posts either linking to, or summarizing, formal papers published elsewhere.
Posts tagged
Academic Papers
Most Relevant
1
64
Some AI research areas and their relevance to existential safety
Andrew Critch
4y
34
2
19
How to Control an LLM's Behavior (why my P(DOOM) went down)
Roger Dearnaley
10mo
0
2
51
2021 AI Alignment Literature Review and Charity Comparison
Larks
3y
15
1
59
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
4mo
0
1
29
A technical note on bilinear layers for interpretability
Lee Sharkey
1y
0
1
19
Formal Solution to the Inner Alignment Problem
michaelcohen
4y
123
1
21
Why is pseudo-alignment "worse" than other ways ML can fail to generalize?
Q
nostalgebraist
,
Evan Hubinger
4y
Q
7
1
28
NeurIPS ML Safety Workshop 2022
Dan H
2y
1
1
25
How truthful is GPT-3? A benchmark for language models
Owain Evans
3y
18
0
16
Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt
,
srm
,
Mikhail Baranchuk
,
Lewis Hammond
,
chansmi
,
Angira Sharma
3d
1
0
14
New paper: Corrigibility with Utility Preservation
Koen Holtman
5y
9
0
18
Learning preferences by looking at the world
Rohin Shah
6y
4
0
26
Human-AI Collaboration
Rohin Shah
5y
4
0
19
Learning biases and rewards simultaneously
Rohin Shah
5y
3
0
0
Let's Discuss Functional Decision Theory
Chris_Leong
6y
0