Gabriel Wu — AI Alignment Forum

Mechanistic estimation for expectations of random products

by Jacob_Hilton, George Robinson, Eric Neyman, paulfchristiano, Mikewins, Victor Lecomte, Wilson Wu, and Gabriel Wu

We have developed some relatively general methods for mechanistic estimation competitive with sampling by studying problems that are expressible as expectations of random products. This includes several different estimation problems, such as random halfspace intersections, random #3-SAT and random permanents. In this post, we will give a high-level introduction to...

May 1550

Why we are excited about confession!

by Boaz Barak, Gabriel Wu, and Manas Joglekar

Boaz Barak, Gabriel Wu, Jeremy Chen, Manas Joglekar [Linkposting from the OpenAI alignment blog, where we post more speculative/technical/informal results and thoughts on safety and alignment.] > TL;DR We go into more details and some follow up results from our paper on confessions (see the original blog post). We give...

Jan 14140

Low Probability Estimation in Language Models

ARC recently released our first empirical paper: Estimating the Probabilities of Rare Language Model Outputs. In this work, we construct a simple setting for low probability estimation — single-token argmax sampling in transformers — and use it to compare the performance of various estimation methods. ARC views low probability estimation...

Oct 18, 202450