This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Academic Papers
Edit
History
Discussion
(0)
Help improve this page (3 flags)
Edit
History
Discussion
(0)
Help improve this page (3 flags)
Academic Papers
Random Tag
Contributors
2
Kaj Sotala
Posts either linking to, or summarizing, formal papers published elsewhere.
Posts tagged
Academic Papers
Most Relevant
1
63
Some AI research areas and their relevance to existential safety
Andrew Critch
3y
37
2
51
2021 AI Alignment Literature Review and Charity Comparison
Larks
1y
13
1
26
A technical note on bilinear layers for interpretability
Lee Sharkey
23d
0
1
19
Formal Solution to the Inner Alignment Problem
michaelcohen
2y
123
1
21
Why is pseudo-alignment "worse" than other ways ML can fail to generalize?
Q
nostalgebraist
,
Evan Hubinger
3y
Q
8
1
28
NeurIPS ML Safety Workshop 2022
Dan H
10mo
1
1
25
How truthful is GPT-3? A benchmark for language models
Owain Evans
2y
18
0
18
Learning preferences by looking at the world
Rohin Shah
4y
4
0
26
Human-AI Collaboration
Rohin Shah
4y
4
0
19
Learning biases and rewards simultaneously
Rohin Shah
4y
3
0
11
New paper: Corrigibility with Utility Preservation
Koen Holtman
4y
0
0
7
Implications of Quantum Computing for Artificial Intelligence Alignment Research
Jaime Sevilla
,
Pablo Antonio Moreno Casares
4y
3
0
10
New paper: The Incentives that Shape Behaviour
Ryan Carey
3y
3
1
4
Demanding and Designing Aligned Cognitive Architectures
Koen Holtman
1y
5