x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Academic Papers — AI Alignment Forum
Academic Papers
Edited by
Kaj_Sotala
last updated
9th Jul 2020
Posts either linking to, or summarizing, formal papers published elsewhere.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
Academic Papers
Most Relevant
1
65
Some AI research areas and their relevance to existential safety
Andrew_Critch
5y
34
2
19
How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley
2y
0
2
51
2021 AI Alignment Literature Review and Charity Comparison
Larks
4y
15
1
58
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
1y
0
1
29
A technical note on bilinear layers for interpretability
Lee Sharkey
3y
0
1
19
Formal Solution to the Inner Alignment Problem
michaelcohen
5y
123
1
21
Why is pseudo-alignment "worse" than other ways ML can fail to generalize?
Q
nostalgebraist
,
evhub
5y
Q
7
0
73
Shallow review of technical AI safety, 2024
technicalities
,
Stag
,
Stephen McAleese
,
jordine
,
Dr. David Mathers
1y
1
1
28
NeurIPS ML Safety Workshop 2022
Dan H
3y
1
0
19
Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt
,
srm
,
MikhailB
,
Lewis Hammond
,
chansmi
,
Angira Sharma
1y
1
1
25
How truthful is GPT-3? A benchmark for language models
Owain_Evans
4y
18
0
14
New paper: Corrigibility with Utility Preservation
Koen.Holtman
6y
9
0
18
Learning preferences by looking at the world
Rohin Shah
7y
4
0
26
Human-AI Collaboration
Rohin Shah
6y
4
0
19
Learning biases and rewards simultaneously
Rohin Shah
6y
3