This dialogue is still in progress, but due to other commitments we don't have much time to continue it. We think the content is interesting, so we decided to publish it unfinished. We will maybe very slowly continue adding to it in the future, but can't commit to doing so....
A pdf version of this report is available here. Summary In this report we argue that AI systems capable of large scale scientific research will likely pursue unwanted goals and this will lead to catastrophic outcomes. We argue this is the default outcome, even with significant countermeasures, given the current...
Produced As Part Of The SERI ML Alignment Theory Scholars Program 2.1, mentored by John Wentworth. Summary We can solve Goodhart's Curse by making the agent (1) know the reliability of its goal representation and (2) cap the amount of optimization power devoted to achieving its goals, based on this...
Produced As Part Of The SERI ML Alignment Theory Scholars Program 2022 Under John Wentworth Introduction This post works off the assumption that the first AGI comes relatively soon, and has an architecture which looks basically like EfficientZero, with a few improvements: a significantly larger world model and a significantly...
Produced As Part Of The SERI ML Alignment Theory Scholars Program 2022 Under John Wentworth Introduction When trying to tackle a hard problem, a generally effective opening tactic is to Hold Off On Proposing Solutions: to fully discuss a problem and the different facets and aspects of it. This is...