This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Wikitags
AF
Login
Subscribe
Discussion
0
Academic Papers
Subscribe
Discussion
0
Written by
Kaj Sotala
last updated
9th Jul 2020
Posts either linking to, or summarizing, formal papers published elsewhere.
Posts tagged
Academic Papers
Most Relevant
1
64
Some AI research areas and their relevance to existential safety
Andrew Critch
4y
34
2
19
How to Control an LLM's Behavior (why my P(DOOM) went down)
Roger Dearnaley
1y
0
2
51
2021 AI Alignment Literature Review and Charity Comparison
Larks
3y
15
1
59
Evidence of Learned Look-Ahead in a Chess-Playing Neural Network
Erik Jenner
10mo
0
1
29
A technical note on bilinear layers for interpretability
Lee Sharkey
2y
0
1
19
Formal Solution to the Inner Alignment Problem
michaelcohen
4y
123
1
21
Why is pseudo-alignment "worse" than other ways ML can fail to generalize?
Q
nostalgebraist
,
Evan Hubinger
5y
Q
7
0
69
Shallow review of technical AI safety, 2024
technicalities
,
Stag
,
Stephen McAleese
,
jordine
,
Dr. David Mathers
3mo
1
1
28
NeurIPS ML Safety Workshop 2022
Dan H
3y
1
1
25
How truthful is GPT-3? A benchmark for language models
Owain Evans
4y
18
0
17
Secret Collusion: Will We Know When to Unplug AI?
schroederdewitt
,
srm
,
Mikhail Baranchuk
,
Lewis Hammond
,
chansmi
,
Angira Sharma
6mo
1
0
14
New paper: Corrigibility with Utility Preservation
Koen Holtman
6y
9
0
18
Learning preferences by looking at the world
Rohin Shah
6y
4
0
26
Human-AI Collaboration
Rohin Shah
5y
4
0
19
Learning biases and rewards simultaneously
Rohin Shah
6y
3