This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
AI Evaluations
•
Applied to
Mechanistically Eliciting Latent Behaviors in Language Models
by
Alex Turner
8d
ago
•
Applied to
An Introduction to AI Sandbagging
by
Teun van der Weij
12d
ago
•
Applied to
Inducing Unprompted Misalignment in LLMs
by
Sam Svenningsen
20d
ago
•
Applied to
LLM Evaluators Recognize and Favor Their Own Generations
by
Arjun Panickssery
21d
ago
•
Applied to
Claude wants to be conscious
by
Joe Kwon
26d
ago
•
Applied to
Measuring Predictability of Persona Evaluations
by
Thee Ho
1mo
ago
•
Applied to
Run evals on base models too!
by
orthonormal
1mo
ago
•
Applied to
OMMC Announces RIP
by
Adam Scholl
1mo
ago
•
Applied to
Third-party testing as a key ingredient of AI policy
by
jacobjacob
1mo
ago
•
Applied to
DeepMind: Evaluating Frontier Models for Dangerous Capabilities
by
Ruben Bloom
2mo
ago
•
Applied to
AI Safety Evaluations: A Regulatory Review
by
Elliot Mckernon
2mo
ago
•
Applied to
Introducing METR's Autonomy Evaluation Resources
by
Megan Kinniment
2mo
ago
•
Applied to
Protocol evaluations: good analogies vs control
by
Charbel-Raphael Segerie
3mo
ago
•
Applied to
Self-Awareness: Taxonomy and eval suite proposal
by
RobertM
3mo
ago
•
Applied to
Critiques of the AI control agenda
by
Arun Jose
3mo
ago
•
Applied to
Skepticism About DeepMind's "Grandmaster-Level" Chess Without Search
by
Arjun Panickssery
3mo
ago
•
Applied to
Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
by
porby
3mo
ago
•
Applied to
The case for more ambitious language model evals
by
Arun Jose
3mo
ago
•
Applied to
Questions I’d Want to Ask an AGI+ to Test Its Understanding of Ethics
by
Sean Sweeney
3mo
ago