This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Academic Papers
Edit
History
Subscribe
Discussion
(0)
Help improve this page (3 flags)
Edit
History
Subscribe
Discussion
(0)
Help improve this page (3 flags)
Academic Papers
Random Tag
Contributors
2
Kaj Sotala
Posts either linking to, or summarizing, formal papers published elsewhere.
Posts tagged
Academic Papers
Most Relevant
1
64
Some AI research areas and their relevance to existential safety
Andrew Critch
3y
34
2
51
2021 AI Alignment Literature Review and Charity Comparison
Larks
2y
15
1
26
A technical note on bilinear layers for interpretability
Lee Sharkey
7mo
0
1
19
Formal Solution to the Inner Alignment Problem
michaelcohen
3y
123
1
21
Why is pseudo-alignment "worse" than other ways ML can fail to generalize?
Q
nostalgebraist
,
Evan Hubinger
3y
Q
7
1
28
NeurIPS ML Safety Workshop 2022
Dan H
1y
1
1
25
How truthful is GPT-3? A benchmark for language models
Owain Evans
2y
18
0
14
New paper: Corrigibility with Utility Preservation
Koen Holtman
4y
9
0
18
Learning preferences by looking at the world
Rohin Shah
5y
4
0
26
Human-AI Collaboration
Rohin Shah
4y
4
0
19
Learning biases and rewards simultaneously
Rohin Shah
4y
3
0
0
Let's Discuss Functional Decision Theory
Chris_Leong
5y
0
0
7
Implications of Quantum Computing for Artificial Intelligence Alignment Research
Jaime Sevilla
,
Pablo Antonio Moreno Casares
4y
3
0
10
New paper: The Incentives that Shape Behaviour
Ryan Carey
4y
3
1
11
VLM-RM: Specifying Rewards with Natural Language
ChengCheng
,
David Lindner
,
Ethan Perez
1mo
0