(This is the video and transcript of talk I gave at the UT Austin AI and Human Objectives Initiative in September 2025. The slides are also available here. The main content of the talk is based on this recent essay.) Talk Hi, everyone. Thank you for coming. I'm honored to...
This is the video and transcript of a talk I gave at Anthropic in April 2025 on automating alignment research. You can also view the slides here. The content from this talk is from a longer essay on the topic, available here, and in audio form (read by the author)...
(This is the fifth essay in a series that I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but see this introduction for a summary of the essays that have been released thus far, and for...
(Audio version here (read by the author), or search for "Joe Carlsmith Audio" on your podcast app. This is the fourth essay in a series that I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but...
(Audio version here (read by the author), or search for "Joe Carlsmith Audio" on your podcast app. This is the third essay in a series that I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but...
(Audio version here (read by the author), or search for "Joe Carlsmith Audio" on your podcast app. This is the second essay in a series that I’m calling “How do we solve the alignment problem?”.[1]I’m hoping that the individual essays can be read fairly well on their own, but see...
(Cross-posted from my website. Audio version here, or search for "Joe Carlsmith Audio" on your podcast app.) Researchers at Redwood Research, Anthropic, and elsewhere recently released a paper documenting cases in which the production version of Claude 3 Opus fakes alignment with a training objective in order to avoid modification...