AI ALIGNMENT FORUM
AF

60
Books of LessWrong

A Moderate Update to your Artificial Priors

Jan 03, 2024 by habryka
95ARC's first technical report: Eliciting Latent Knowledge
paulfchristiano, Mark Xu, Ajeya Cotra
4y
72
71Fun with +12 OOMs of Compute
Daniel Kokotajlo
5y
45
120What 2026 looks like
Daniel Kokotajlo
4y
33
87Ngo and Yudkowsky on alignment difficulty
Eliezer Yudkowsky, Richard_Ngo
4y
53
74Another (outer) alignment failure story
paulfchristiano
4y
25
93What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)
Andrew_Critch
5y
49
75The Plan
johnswentworth
4y
19
54Finite Factored Sets
Scott Garrabrant
4y
70
50Selection Theorems: A Program For Understanding Agents
johnswentworth
4y
24
72My research methodology
paulfchristiano
5y
35
61larger language models may disappoint you [or, an eternally unfinished draft]
nostalgebraist
4y
7
56Comments on Carlsmith's “Is power-seeking AI an existential risk?”
So8res
4y
11
64EfficientZero: How It Works
1a3orn
4y
2
31Specializing in Problems We Don't Understand
johnswentworth
5y
0