This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Mesa-Optimization
•
Applied to
Principled Satisficing To Avoid Goodhart
by
JenniferRM
3mo
ago
•
Applied to
Pacing Outside the Box: RNNs Learn to Plan in Sokoban
by
Adrià Garriga-Alonso
3mo
ago
•
Applied to
Finding Backward Chaining Circuits in Transformers Trained on Tree Search
by
abhayesian
5mo
ago
•
Applied to
Inner Optimization Mechanisms in Neural Nets
by
ProgramCrafter
6mo
ago
•
Applied to
The Human's Role in Mesa Optimization
by
silentbob
6mo
ago
•
Applied to
Visualizing neural network planning
by
Nevan Wichers
6mo
ago
•
Applied to
Measuring Learned Optimization in Small Transformer Models
by
Jonathan Bostock
7mo
ago
•
Applied to
Understanding mesa-optimization using toy models
by
tilmanr
7mo
ago
•
Applied to
Counting arguments provide no evidence for AI doom
by
Quintin Pope
8mo
ago
•
Applied to
The Inner Alignment Problem
by
Jakub Halmeš
8mo
ago
•
Applied to
Satisficers want to become maximisers
by
JenniferRM
1y
ago
•
Applied to
Mesa-Optimization: Explain it like I'm 10 Edition
by
brook
1y
ago
•
Applied to
Runaway Optimizers in Mind Space
by
silentbob
1y
ago
•
Applied to
Disincentivizing deception in mesa optimizers with Model Tampering
by
martinkunev
1y
ago