AI ALIGNMENT FORUMTags
AF

Research Agendas

•

Applied to What and Why: Developmental Interpretability of Reinforcement Learning by Ruben Bloom 17d ago

•

Applied to Labor Participation is a High-Priority AI Alignment Risk by Alexander Dean Foster 1mo ago

•

Applied to What should I do? (long term plan about starting an AI lab) by not_a_cat 2mo ago

•

Applied to What should AI safety be trying to achieve? by EuanMcLean 2mo ago

•

Applied to Announcing Human-aligned AI Summer School by Jan_Kulveit 2mo ago

•

Applied to EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024 by Stephen Casper 2mo ago

•

Applied to The Prop-room and Stage Cognitive Architecture by Robert Kralisch 3mo ago

•

Applied to Speedrun ruiner research idea by Luke H Miles 3mo ago

•

Applied to Constructability: Plainly-coded AGIs may be feasible in the near future by Charbel-Raphael Segerie 4mo ago

•

Applied to Sparsify: A mechanistic interpretability research agenda by Marius Hobbhahn 4mo ago

•

Applied to Gradient Descent on the Human Brain by Arun Jose 4mo ago

•

Applied to Towards White Box Deep Learning by Maciej Satkiewicz 4mo ago

•

Applied to Natural abstractions are observer-dependent: a conversation with John Wentworth by Martín Soto 5mo ago

•

Applied to Gaia Network: An Illustrated Primer by Rafael Kaufmann Nedal 6mo ago

•

Applied to Worrisome misunderstanding of the core issues with AI transition by Roman Leventov 6mo ago