This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
543
Wikitags
Academic Papers
Edited by
Kaj_Sotala
last updated
9th Jul 2020
Posts either linking to, or summarizing, formal papers published elsewhere.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged
Academic Papers
Most Relevant
65
Some AI research areas and their relevance to existential safety
Andrew_Critch
5y
34
19
How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley
2y
0
51
2021 AI Alignment Literature Review and Charity Comparison
Larks
4y
15
58
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
1y
0
29
A technical note on bilinear layers for interpretability
Lee Sharkey
2y
0
19
Formal Solution to the Inner Alignment Problem
michaelcohen
5y
123
21
Why is pseudo-alignment "worse" than other ways ML can fail to generalize?
Q
nostalgebraist
,
evhub
5y
Q
7
73
Shallow review of technical AI safety, 2024
technicalities
,
Stag
,
Stephen McAleese
,
jordine
,
Dr. David Mathers
10mo
1
28
NeurIPS ML Safety Workshop 2022
Dan H
3y
1
19
Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt
,
srm
,
MikhailB
,
Lewis Hammond
,
chansmi
,
sofmonk
1y
1
25
How truthful is GPT-3? A benchmark for language models
Owain_Evans
4y
18
14
New paper: Corrigibility with Utility Preservation
Koen.Holtman
6y
9
18
Learning preferences by looking at the world
Rohin Shah
7y
4
26
Human-AI Collaboration
Rohin Shah
6y
4
19
Learning biases and rewards simultaneously
Rohin Shah
6y
3