This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
SERI MATS
•
Applied to
Clarifying mesa-optimization
by
Marius Hobbhahn
9d
ago
•
Applied to
Empirical risk minimization is fundamentally confused
by
Jesse Hoogland
9d
ago
•
Applied to
Natural Abstractions: Key claims, Theorems, and Critiques
by
Erik Jenner
15d
ago
•
Applied to
[Appendix] Natural Abstractions: Key Claims, Theorems, and Critiques
by
Erik Jenner
15d
ago
•
Applied to
Understanding and controlling a maze-solving policy network
by
Ryan Kidd
19d
ago
•
Applied to
A mechanistic explanation for SolidGoldMagikarp-like tokens in GPT2
by
Stefan Heimersheim
22d
ago
•
Applied to
Why are counterfactuals elusive?
by
Ryan Kidd
1mo
ago
•
Applied to
Predictions for shard theory mechanistic interpretability results
by
Alex Turner
1mo
ago
•
Applied to
Searching for a model's concepts by their shape – a theoretical framework
by
Kaarel
1mo
ago
•
Applied to
Intervening in the Residual Stream
by
Ryan Kidd
1mo
ago
•
Applied to
A Neural Network undergoing Gradient-based Training as a Complex System
by
Spencer Becker-Kahn
1mo
ago
•
Applied to
The shallow reality of 'deep learning theory'
by
Jesse Hoogland
1mo
ago
•
Applied to
Qualities that alignment mentors value in junior researchers
by
Akash
2mo
ago
•
Applied to
A circuit for Python docstrings in a 4-layer attention-only transformer
2mo
ago
•
Applied to
SERI ML Alignment Theory Scholars Program 2022
by
elifland
2mo
ago
•
Applied to
SolidGoldMagikarp II: technical details and more recent findings
by
Ryan Kidd
2mo
ago
•
Applied to
[ASoT] Policy Trajectory Visualization
by
David Udell
2mo
ago
•
Applied to
Gradient surfing: the hidden role of regularization
by
Jesse Hoogland
2mo
ago