AI-Assisted Alignment

Edited by habryka, niplav, et al. last updated 20th May 2025

AI-Assisted Alignment is a cluster of alignment plans that involve AI somehow significantly helping with alignment research. This can include weak tool AI, or more advanced AGI doing original research.

There has been some debate about how practical this alignment approach is.

AI systems will likely try to solve alignment for their modifications and/or successors during a phase of self-improvement.

Other search terms for this tag: AI aligning AI, automated AI alignment, automated alignment research

Posts tagged AI-Assisted Alignment

3

17A "Bitter Lesson" Approach to Aligning AGI and ASI

RogerDearnaley

2y

0

3

11Grounding Value Learning in Evolutionary Psychology: an Alternative Proposal to CEV

RogerDearnaley

4mo

2

3

16Requirements for a Basin of Attraction to Alignment

RogerDearnaley

2y

0

1

7The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?

RogerDearnaley

1y

0

3

40[Link] Why I’m optimistic about OpenAI’s alignment approach

janleike

3y

12

1

19How to Control an LLM's Behavior (why my P(DOOM) went down)

RogerDearnaley

2y

0

2

72Cyborgism

NicholasKees, janus

3y

20

1

46Beliefs and Disagreements about Automating Alignment Research

Ian McKenzie

4y

0

3

32[Link] A minimal viable product for alignment

janleike

4y

31

2

101Discussion with Nate Soares on a key alignment difficulty

HoldenKarnofsky

3y

20

1

57[Linkpost] Introducing Superalignment

beren

3y

11

2

52Davidad's Bold Plan for Alignment: An In-Depth Explanation

Charbel-Raphaël, Gabin

3y

5

1

37Cyborg Periods: There will be multiple AI transitions

Jan_Kulveit, rosehadshar

3y

2

26Capabilities and alignment of LLM cognitive architectures

Seth Herd

3y

0

1

44How might we safely pass the buck to AI?

joshc

1y

32