This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Inner Alignment
•
Applied to
Pacing Outside the Box: RNNs Learn to Plan in Sokoban
by
Adrià Garriga-Alonso
2d
ago
•
Applied to
A more systematic case for inner misalignment
by
Ruben Bloom
7d
ago
•
Applied to
Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent
by
Karolis Jucys
10d
ago
•
Applied to
A simple case for extreme inner misalignment
by
quila
14d
ago
•
Applied to
A "Bitter Lesson" Approach to Aligning AGI and ASI
by
Roger Dearnaley
22d
ago
•
Applied to
Language for Goal Misgeneralization: Some Formalisms from my MSc Thesis
by
Giulio Starace
1mo
ago
•
Applied to
Demystifying "Alignment" through a Comic
by
Milan Rosko
2mo
ago
•
Applied to
Finding Backward Chaining Circuits in Transformers Trained on Tree Search
by
abhayesian
2mo
ago
•
Applied to
minutes from a human-alignment meeting
by
bhauth
2mo
ago
•
Applied to
SAE sparse feature graph using only residual layers
by
Jaehyuk Lim
2mo
ago
•
Applied to
Implementing Asimov's Laws of Robotics - How I imagine alignment working.
by
Joshua Clancy
2mo
ago
•
Applied to
Visualizing neural network planning
by
Nevan Wichers
3mo
ago
•
Applied to
Measuring Learned Optimization in Small Transformer Models
by
Jonathan Bostock
4mo
ago
•
Applied to
[Aspiration-based designs] 1. Informal introduction
by
Jobst Heitzig
4mo
ago
•
Applied to
On the Confusion between Inner and Outer Misalignment
by
jacobjacob
4mo
ago
•
Applied to
Invitation to the Princeton AI Alignment and Safety Seminar
by
Sadhika Malladi
4mo
ago