AI ALIGNMENT FORUM
AF

Jacob_Hilton's Shortform

by Jacob_Hilton
1st May 2025
1 min read
8

4

This is a special post for quick takes by Jacob_Hilton. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
Jacob_Hilton's Shortform
4Jacob_Hilton
1 comment, sorted by
top scoring
Click to highlight new comments since: Today at 7:32 AM
[-]Jacob_Hilton4mo*40

I recently gave this talk at the Safety-Guaranteed LLMs workshop:

The talk is about ARC's work on low probability estimation (LPE), covering:

  • Theoretical motivation for LPE and (towards the end) activation modeling approaches (both described here)
  • Empirical work on LPE in language models (described here)
  • Recent work-in-progress on theoretical results
Reply
Moderation Log
More from Jacob_Hilton
View more
Curated and popular this week
1Comments
Mentioned in
56My AGI timeline updates from GPT-5 (and 2025 so far)
49Trust me bro, just one more RL scale up, this one will be the real scale up with the good environments, the actually legit one, trust me bro