This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
AI
•
Applied to
LLMs seem (relatively) safe
by
JustisMills
5h
ago
•
Applied to
Why I stopped being into basin broadness
by
TagWrong
7h
ago
•
Applied to
AXRP Episode 29 - Science of Deep Learning with Vikrant Varma
by
TagWrong
8h
ago
•
Applied to
Improving Dictionary Learning with Gated Sparse Autoencoders
by
Neel Nanda
9h
ago
•
Applied to
Cybersecurity of Frontier AI Models
by
TagWrong
13h
ago
•
Applied to
The first future and the best future
by
TagWrong
21h
ago
•
Applied to
At last! ChatGPT does, shall we say, interesting imitations of “Kubla Khan”
by
Bill Benzon
2d
ago
•
Applied to
1-page outline of Carlsmith's otherness and control series
by
Nathan Young
2d
ago
•
Applied to
How to use and interpret activation patching
by
TagWrong
2d
ago
•
Applied to
Simple probes can catch sleeper agents
by
TagWrong
2d
ago
•
Applied to
ProLU: A Nonlinearity for Sparse Autoencoders
by
TagWrong
3d
ago
•
Applied to
How LLMs Work, in the Style of The Economist
by
Rocket Drew
3d
ago
•
Applied to
AI Regulation is Unsafe
by
Maxwell Tabarrok
3d
ago
•
Applied to
On Llama-3 and Dwarkesh Patel’s Podcast with Zuckerberg
by
TagWrong
4d
ago
•
Applied to
Motivation gaps: Why so much EA criticism is hostile and lazy
by
TagWrong
4d
ago
•
Applied to
Should we break up Google DeepMind?
by
TagWrong
4d
ago