jordine

Shallow review of technical AI safety, 2025

WebsiteEditorialRepo Change in 18 latent capabilities between GPT-3 and o1, from Zhou et al (2025) This is the third annual review of what’s going on in technical AI safety. You could stop reading here and instead explore the data on the shallow review website. It’s shallow in the sense that...

Dec 17, 2025186

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

jordine

jordine

Shallow review of technical AI safety, 2025

Here’s 18 Applications of Deception Probes

Shallow review of technical AI safety, 2024

Can SAE steering reveal sandbagging?

jordine

Shallow review of technical AI safety, 2025

Here’s 18 Applications of Deception Probes

Shallow review of technical AI safety, 2024

Can SAE steering reveal sandbagging?

Shallow review of technical AI safety, 2025

Shallow review of technical AI safety, 2024