AI ALIGNMENT FORUM
AF

182
Alignment Stream of Thought

Alignment Stream of Thought

Mar 27, 2022 by leogao

Epistemic status: statements dreamed up by the utterly Deranged

This sequence contains posts that are lower effort than my usual posts, where instead of thinking things all the way through before posting something polished about them, I instead post things that are rough and in-progress as I think about them. I'm trying this because I noticed that I had lots of interesting thoughts that I didn't want to share due to not having totally figured it out yet, and that the process of writing things and posting them often helps me make progress. 

Anything in this sequence is at even greater risk than usual of being obsoleted or unendorsed later down the road. It will also be more difficult to follow than usual, because I'm not putting as much effort as usual into explaining background.

I am hoping to eventually do a distillation of the important insights of this sequence into more legible post(s) once I'm less confused.

13[ASoT] Observations about ELK
leogao
4y
0
13[ASoT] Some ways ELK could still be solvable in practice
leogao
4y
0
12[ASoT] Searching for consequentialist structure
leogao
4y
0
11[ASoT] Some thoughts about deceptive mesaoptimization
leogao
4y
0
4[ASoT] Some thoughts about LM monologue limitations and ELK
leogao
4y
0
3[ASoT] Some thoughts about imperfect world modeling
leogao
3y
0
16[ASoT] Consequentialist models as a superset of mesaoptimizers
leogao
3y
1
15Humans Reflecting on HRH
leogao
3y
3
35Towards deconfusing wireheading and reward maximization
leogao
3y
1
22[ASoT] Some thoughts on human abstractions
leogao
3y
2