If you’re working in alignment research, or would like to be, and you would like help finding researchers with whom to collaborate, this event is for you. Inspired by the structure of a speed-dating event, participants will spend the first two hours in short one-on-one conversations with each other, changing...
This week, the key alignment group, we answered two questions, 5-minute timer style: 1. Map out all of alignment (25 minutes) 2. Create an image/ table representing alignment (10 min.) You are free to stop here, to actually try to answer the questions yourself. Here is a link for a...
This idea is due to Scott Garrabrant. Suppose you have propositions φ1,...,φn, and you want to form beliefs about whether they are true; specifically, you want to form a joint probability distribution P over the events φ1,...,φn. But there’s a catch: these propositions might refer to the joint probability distribution...
Consider an AI that is trying to achieve a certain result in a toy world running on a computer. Compare two models of what the AI is and what it's trying to do: first, you could say the AI is a physical program on a computer, which is trying to...
Iterated Distillation and Amplification has often been compared to AlphaGo Zero. But in the case of a Go-playing AI, people often don't see a distinction between alignment and capability, instead measuring success on a single axis: ability to win games of Go. But I think a useful analogy can still...