x

AI ALIGNMENT FORUM

AF

Academic Papers — AI Alignment Forum

Academic Papers

Edited by Kaj_Sotala last updated 9th Jul 2020

Posts either linking to, or summarizing, formal papers published elsewhere.

Add Posts

Posts tagged Academic Papers

1

65Some AI research areas and their relevance to existential safety

6y

34

3

38Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training

4mo

0

3

19How to Control an LLM's Behavior (why my P(DOOM) went down)

2y

0

3

5The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?

1y

0

1

67The paper that killed deep learning theory

1mo

5

2

512021 AI Alignment Literature Review and Charity Comparison

4y

15

1

58Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

2y

0

1

33The other paper that killed deep learning theory

1mo

5

1

29A technical note on bilinear layers for interpretability

3y

0

1

19Formal Solution to the Inner Alignment Problem

5y

123

1

21Why is pseudo-alignment "worse" than other ways ML can fail to generalize?

nostalgebraist, evhub

6y

7

0

75Shallow review of technical AI safety, 2024

technicalities, Stag, Stephen McAleese, jordinne, Dr. David Mathers

1y

1

1

55Shallow review of technical AI safety, 2025

technicalities, Tomáš Gavenčiak, Stephen McAleese, peligrietzer, Stag, jordinne, ozziegooen, Violet Hour, lenz

5mo

0

1

28NeurIPS ML Safety Workshop 2022

4y

1

0

19Secret Collusion: Will We Know When to Unplug AI?

schroederdewitt, srm, MikhailB, Lewis Hammond, chansmi, Angira Sharma

2y

1

Load More (15/26)

Add Posts