AI ALIGNMENT FORUM
AF

Wikitags

AI-Assisted Alignment

Edited by habryka, niplav, et al. last updated 20th May 2025

AI-Assisted Alignment is a cluster of alignment plans that involve AI somehow significantly helping with alignment research. This can include weak tool AI, or more advanced AGI doing original research.

There has been some debate about how practical this alignment approach is.

AI systems will likely try to solve alignment for their modifications and/or successors during a phase of self-improvement.

Other search terms for this tag: AI aligning AI, automated AI alignment, automated alignment research

Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged AI-Assisted Alignment
18A "Bitter Lesson" Approach to Aligning AGI and ASI
RogerDearnaley
1y
0
16Requirements for a Basin of Attraction to Alignment
RogerDearnaley
2y
0
7The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
RogerDearnaley
3mo
0
40[Link] Why I’m optimistic about OpenAI’s alignment approach
janleike
3y
12
72Cyborgism
NicholasKees, janus
3y
20
46Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
3y
0
19How to Control an LLM's Behavior (why my P(DOOM) went down)
RogerDearnaley
2y
0
32[Link] A minimal viable product for alignment
janleike
3y
31
95Discussion with Nate Soares on a key alignment difficulty
HoldenKarnofsky
2y
20
57[Linkpost] Introducing Superalignment
beren
2y
11
52Davidad's Bold Plan for Alignment: An In-Depth Explanation
Charbel-Raphaël, Gabin
2y
5
36Cyborg Periods: There will be multiple AI transitions
Jan_Kulveit, rosehadshar
3y
2
26Capabilities and alignment of LLM cognitive architectures
Seth Herd
2y
0
44How might we safely pass the buck to AI?
joshc
6mo
32
27Instruction-following AGI is easier and more likely than value aligned AGI
Seth Herd
1y
5
Load More (15/43)
Add Posts