This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Deception
•
Applied to
Deceptive failures short of full catastrophe.
by
Alex Lawsen
at
20d
•
Applied to
The commercial incentive to intentionally train AI to deceive us
by
Derek M. Jones
at
1mo
•
Applied to
Getting up to Speed on the Speed Prior in 2022
by
robertzk
at
1mo
•
Applied to
Monitoring for deceptive alignment
by
Noosphere89
at
5mo
•
Applied to
How likely is deceptive alignment?
by
Raymond Arnold
at
5mo
•
Applied to
Three scenarios of pseudo-alignment
by
Eleni Angelou
at
5mo
•
Applied to
Deception as the optimal: mesa-optimizers and inner alignment
by
Eleni Angelou
at
6mo
•
Applied to
Precursor checking for deceptive alignment
by
Raymond Arnold
at
6mo
•
Applied to
Modelling Deception
by
RobertM
at
7mo
•
Applied to
Deception?! I ain’t got time for that!
by
Paul Colognese
at
7mo
•
Applied to
Latent Adversarial Training
by
Adam Jermyn
at
7mo
•
Applied to
Training Trace Priors and Speed Priors
by
Adam Jermyn
at
7mo
•
Applied to
Formalizing Deception
by
JamesH
at
7mo
•
Applied to
Conditioning Generative Models
by
Adam Jermyn
at
7mo
•
Applied to
Multigate Priors
by
Adam Jermyn
at
8mo
•
Applied to
Training Trace Priors
by
Adam Jermyn
at
8mo
•
Applied to
Gracefully correcting uncalibrated shame
by
HS2021
at
8mo
•
Applied to
Why I'm Worried About AI
by
Peter Barnett
at
8mo
•
Applied to
The Speed + Simplicity Prior is probably anti-deceptive
by
Ruben Bloom
at
9mo