This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
AI
•
Applied to
a talk with Imaginary Paul Graham
by
bhauth
1h
ago
•
Applied to
Why No Toy ELK Benchmarks?
by
TagWrong
2h
ago
•
Applied to
It Is Powerful, It Can't Be Aimed
by
Zahima
7h
ago
•
Applied to
The King and the Golem
by
Raymond Arnold
8h
ago
•
Applied to
ARENA 2.0 - Impact Report
by
TagWrong
10h
ago
•
Applied to
Mechanistic Interpretability Reading group
by
TagWrong
11h
ago
•
Applied to
Announcing the CNN Interpretability Competition
by
TagWrong
11h
ago
•
Applied to
Making AIs less likely to be spiteful
by
Nicolas Macé
13h
ago
•
Applied to
[Linkpost] Mark Zuckerberg confronted about Meta's Llama 2 AI's ability to give users detailed guidance on making anthrax - Business Insider
by
TagWrong
15h
ago
•
Applied to
A few Alignment questions: utility optimizers, SLT, sharp left turn and identifiability
by
TagWrong
1d
ago
•
Applied to
Impact stories for model internals: an exercise for interpretability researchers
by
TagWrong
1d
ago
•
Applied to
Welcome to Apply: The 2024 Vitalik Buterin Fellowships in AI Existential Safety by FLI!
by
RobertM
1d
ago
•
Applied to
Public Opinion on AI Safety: AIMS 2023 and 2021 Summary
by
TagWrong
1d
ago
•
Applied to
Evaluating hidden directions on the utility dataset: classification, steering and removal
by
TagWrong
1d
ago
•
Applied to
Understanding strategic deception and deceptive alignment
by
TagWrong
1d
ago
•
Applied to
“X distracts from Y” as a thinly-disguised fight over group status / politics
by
Steve Byrnes
1d
ago
•
Applied to
Amazon to invest up to $4 billion in Anthropic
by
TagWrong
2d
ago
•
Applied to
Who determines whether an alignment proposal is the definitive alignment solution?
by
Miguel de Guzman
2d
ago
•
Applied to
Automating Intelligence: A Cursory Glance at How AutoML Brings Precision to AI Development
by
RoscoHunter
2d
ago
•
Applied to
RAIN: Your Language Models Can Align Themselves without Finetuning - Microsoft Research 2023 - Reduces the adversarial prompt attack success rate from 94% to 19%!
by
Ruben Bloom
2d
ago