x
This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Academic Papers — AI Alignment Forum
Academic Papers
Edited by
Kaj_Sotala
last updated
9th Jul 2020
Posts either linking to, or summarizing, formal papers published elsewhere.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
Academic Papers
Most Relevant
1
65
Some AI research areas and their relevance to existential safety
Andrew_Critch
5y
34
3
37
Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training
RogerDearnaley
1mo
0
3
19
How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley
2y
0
3
7
The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
RogerDearnaley
9mo
0
2
51
2021 AI Alignment Literature Review and Charity Comparison
Larks
4y
15
1
58
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
2y
0
1
29
A technical note on bilinear layers for interpretability
Lee Sharkey
3y
0
1
19
Formal Solution to the Inner Alignment Problem
michaelcohen
5y
123
1
21
Why is pseudo-alignment "worse" than other ways ML can fail to generalize?
Q
nostalgebraist
,
evhub
6y
Q
7
0
75
Shallow review of technical AI safety, 2024
technicalities
,
Stag
,
Stephen McAleese
,
jordine
,
Dr. David Mathers
1y
1
1
53
Shallow review of technical AI safety, 2025
technicalities
,
Tomáš Gavenčiak
,
Stephen McAleese
,
peligrietzer
,
Stag
,
jordine
,
ozziegooen
,
Violet Hour
,
ramennaut
2mo
0
1
28
NeurIPS ML Safety Workshop 2022
Dan H
4y
1
0
19
Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt
,
srm
,
MikhailB
,
Lewis Hammond
,
chansmi
,
Angira Sharma
1y
1
1
25
How truthful is GPT-3? A benchmark for language models
Owain_Evans
4y
18
0
14
New paper: Corrigibility with Utility Preservation
Koen.Holtman
7y
9