AI ALIGNMENT FORUM
AF

Wikitags

Aligned AI Proposals

Edited by Dakara last updated 4th Jan 2025

Aligned AI Proposals are proposals aimed at ensuring artificial intelligence systems behave in accordance with human intentions (intent alignment) or human values (value alignment).

The main goal of these proposals is to ensure that AI systems will, all things considered, benefit humanity.

Subscribe
Subscribe
Discussion0
Discussion0
Posts tagged Aligned AI Proposals
4The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
Roger Dearnaley
2mo
0
18A "Bitter Lesson" Approach to Aligning AGI and ASI
Roger Dearnaley
1y
0
11Why Aligning an LLM is Hard, and How to Make it Easier
Roger Dearnaley
6mo
0
19How to Control an LLM's Behavior (why my P(DOOM) went down)
Roger Dearnaley
2y
0
8Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor
Roger Dearnaley
2y
0
16Requirements for a Basin of Attraction to Alignment
Roger Dearnaley
1y
0
13Interpreting the Learning of Deceit
Roger Dearnaley
2y
2
56A list of core AI safety problems and how I hope to solve them
davidad (David A. Dalrymple)
2y
12
48AI Alignment Metastrategy
Vanessa Kosoy
2y
1
18Safety First: safety before full alignment. The deontic sufficiency hypothesis.
Chris Lakin
2y
1
15We have promising alignment plans with low taxes
Seth Herd
2y
0
18The (partial) fallacy of dumb superintelligence
Seth Herd
2y
0
34An Open Agency Architecture for Safe Transformative AI
davidad (David A. Dalrymple)
3y
18
43Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem
Ansh Radhakrishnan, Buck Shlegeris, Ryan Greenblatt, Fabien Roger
2y
2
15How to safely use an optimizer
Simon Fischer
1y
0
Load More (15/19)
Add Posts