x

AI ALIGNMENT FORUM

AF

Aligned AI Proposals — AI Alignment Forum

Aligned AI Proposals

Edited by Dakara last updated 4th Jan 2025

Aligned AI Proposals are proposals aimed at ensuring artificial intelligence systems behave in accordance with human intentions (intent alignment) or human values (value alignment).

The main goal of these proposals is to ensure that AI systems will, all things considered, benefit humanity.

Add Posts

Posts tagged Aligned AI Proposals

3

7The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?

11mo

0

3

19How to Control an LLM's Behavior (why my P(DOOM) went down)

2y

0

3

17A "Bitter Lesson" Approach to Aligning AGI and ASI

2y

0

3

13Why Aligning an LLM is Hard, and How to Make it Easier

1y

0

2

8Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor

2y

0

3

16Requirements for a Basin of Attraction to Alignment

2y

0

2

14Interpreting the Learning of Deceit

2y

2

1

56A list of core AI safety problems and how I hope to solve them

3y

12

1

48AI Alignment Metastrategy

2y

1

1

38Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training

3mo

0

2

18Safety First: safety before full alignment. The deontic sufficiency hypothesis.

2y

1

1

16We have promising alignment plans with low taxes

2y

0

1

18The (partial) fallacy of dumb superintelligence

2y

0

1

34An Open Agency Architecture for Safe Transformative AI

3y

18

1

43Scalable Oversight and Weak-to-Strong Generalization: Compatible approaches to the same problem

Ansh Radhakrishnan, Buck, ryan_greenblatt, Fabien Roger

2y

2

Load More (15/20)

Add Posts