[This post summarizes some of the work done by Owen Dudney, Roman Engeler and myself (Quintin Pope) as part of the SERI MATS shard theory stream.] TL;DR Future prosaic AIs will likely shape their own development or that of successor AIs. We're trying to make sure they don't go insane....
TL;DR: We evaluate Causal Scrubbing (CaSc) on synthetic graphs with known ground truth to determine its reliability in confirming correct hypotheses and rejecting incorrect ones. First, we show that CaSc can accurately identify true hypotheses and quantify the degree to which a hypothesis is wrong. Second, we highlight some limitations...
Update February 21st: After the initial publication of this article (January 3rd) we received a lot of feedback and several people pointed out that propositions 1 and 2 were incorrect as stated. That was unfortunate as it distracted from the broader arguments in the article and I (Jan K) take...
Meta: Over the past few months, we've held a seminar series on the Simulators theory by janus. As the theory is actively under development, the purpose of the series is to discover central structures and open problems. Our aim with this sequence is to share some of our discussions with...
Thanks to Andrei Alexandru, Joe Collman, Michael Einhorn, Kyle McDonell, Daniel Paleka, and Neel Nanda for feedback on drafts and/or conversations which led to useful insights for this work. In addition, thank you to both William Saunders and Alex Gray for exceptional mentorship throughout this project. The majority of this...
In March 22nd, 2022, we released a survey with an accompanying post for the purpose of getting more insight into what tools we could build to augment alignment researchers and accelerate alignment research. Since then, we’ve also released a dataset, a manuscript (LW post), and the (relevant) Simulators post was...
TL;DR: In this project, we collected and cataloged AI alignment research literature and analyzed the resulting dataset in an unbiased way to identify major research directions. We found that the field is growing quickly, with several subfields emerging in parallel. We looked at the subfields and identified the prominent researchers,...