2269

AI ALIGNMENT FORUM
AF

2268

Jacob_Hilton's Shortform — AI Alignment Forum

Jacob_Hilton's Shortform

by Jacob_Hilton

1st May 2025

1 min read

4

This is a special post for quick takes by Jacob_Hilton. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.

Jacob_Hilton's Shortform

1 comment, sorted by

Click to highlight new comments since: Today at 4:37 AM

[-]Jacob_Hilton6mo*40

I recently gave this talk at the Safety-Guaranteed LLMs workshop:

The talk is about ARC's work on low probability estimation (LPE), covering:

Theoretical motivation for LPE and (towards the end) activation modeling approaches (both described here)
Empirical work on LPE in language models (described here)
Recent work-in-progress on theoretical results

More from Jacob_Hilton

Curated and popular this week

Mentioned in

56My AGI timeline updates from GPT-5 (and 2025 so far)

57Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro

38Research Agenda: Synthesizing Standalone World-Models