This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Threat Models
•
Applied to
Help me solve this problem: The basilisk isn't real, but people are
by
canary in the machine
15d
ago
•
Applied to
Thoughts On (Solving) Deep Deception
by
Arun Jose
2mo
ago
•
Applied to
Against Almost Every Theory of Impact of Interpretability
by
Charbel-Raphael Segerie
4mo
ago
•
Applied to
Proof of posteriority: a defense against AI-generated misinformation
by
jchan
5mo
ago
•
Applied to
Gearing Up for Long Timelines in a Hard World
by
Dalcy
5mo
ago
•
Applied to
An Overview of AI risks - the Flyer
by
Charbel-Raphael Segerie
5mo
ago
•
Applied to
Ten Levels of AI Alignment Difficulty
by
Samuel Dylan Martin
5mo
ago
•
Applied to
The Main Sources of AI Risk?
by
elifland
5mo
ago
•
Applied to
Challenge proposal: smallest possible self-hardening backdoor for RLHF
by
Christopher King
5mo
ago
•
Applied to
Will the growing deer prion epidemic spread to humans? Why not?
by
Georgia Ray
6mo
ago
•
Applied to
A Friendly Face (Another Failure Story)
by
Karl von Wendt
6mo
ago
•
Applied to
Persuasion Tools: AI takeover without AGI or agency?
by
elifland
6mo
ago
•
Applied to
Agentic Mess (A Failure Story)
by
Karl von Wendt
6mo
ago
•
Applied to
AI interpretability could be harmful?
by
Roman Leventov
7mo
ago
•
Applied to
AGI-Automated Interpretability is Suicide
by
__RicG__
7mo
ago
•
Applied to
Gradient hacking via actual hacking
by
Max H
7mo
ago