x

AI ALIGNMENT FORUM

AF

Ben Cottier — AI Alignment Forum

Ben Cottier

Top postsTop post

Ben Cottier

Message

At Epoch, helping to clarify when and how transformative AI capabilities will be developed.

Previously a Research Fellow on the AI Governance & Strategy team at Rethink Priorities.

329

Ω

136

16

20

7y

Ben Cottier

At Epoch, helping to clarify when and how transformative AI capabilities will be developed.

Previously a Research Fellow on the AI Governance & Strategy team at Rethink Priorities.

Top postsTop post

Clarifying some key hypotheses in AI alignment

We've created a diagram mapping out important and controversial hypotheses for AI alignment. We hope that this will help researchers identify and more productively discuss their disagreements. Diagram A part of the diagram. Click through to see the full version. Caveats 1. This does not decompose arguments exhaustively. It does not include every reason to favour or disfavour ideas. Rather, it is a set of key hypotheses and relationships with other hypotheses, problems, solutions, models, etc. Some examples of important but apparently uncontroversial premises within the AI safety community: orthogonality, complexity of value, Goodhart's Curse, AI being deployed in a catastrophe-sensitive context. 2. This is not a comprehensive collection of key hypotheses across the whole space of AI alignment. It focuses on a subspace that we find interesting and is relevant to more recent discussions we have encountered, but where key hypotheses seem relatively less illuminated. This includes rational agency and goal-directedness, CAIS, corrigibility, and the rationale of foundational and practical research. In hindsight, the selection criteria was something like: 1. The idea is closely connected to the problem of artificial systems optimizing adversarially against humans. 2. The idea must be explained sufficiently well that we believe it is plausible. 3. Arrows in the diagram indicate flows of evidence or soft relations, not absolute logical implications — please read the "interpretation" box in the diagram. Also pay attention to any reasoning written next to a Yes/No/Defer arrow — you may disagree with it, so don't blindly follow the arrow! Background Much has been written in the way of arguments for AI risk. Recently there have been some talks and posts that clarify different arguments, point to open questions, and highlight the need for further clarification and analysis. We largely share their assessments and echo their recommendations. One aspect of the di

Modeling Failure Modes of High-Level Machine Intelligence

Modeling the impact of safety agendas

Modeling Risks From Learned Optimization

Trends in the dollar training cost of machine learning systems

Summary 1. Using a dataset of 124 machine learning (ML) systems published between 2009 and 2022,[1] I estimate that the cost of compute in US dollars for the final training run of ML systems has grown by 0.49 orders of magnitude (OOM) per year (90% CI: 0.37 to 0.56).[2] See...

Feb 1, 2023•23

Understanding the diffusion of large language models: summary

5-minute summary Time from GPT-3’s publication (May 2020) to…TimeDateModelOrganizations involvedA better model is produced[1]7 months[2]Dec 2020GopherDeepmind A better or equally good model is open-sourced[3] AND A successful, explicit attempt at replicating GPT-3 is completed 23 months (equally good model[4]) May 2022OPT-175BMeta AI Research Table 1: Key facts about the diffusion...

Jan 16, 2023•26

Modeling Failure Modes of High-Level Machine Intelligence

This post, which deals with some widely discussed failure modes of transformative AI, is part 7 in our sequence on Modeling Transformative AI Risk. In this series of posts, we are presenting a preliminary model of the relationships between key hypotheses in debates about catastrophic risks from AI. Previous posts...

Dec 6, 2021•54

Modeling the impact of safety agendas

This post, which deals with how safety research - that is, technical research agendas aiming to reduce AI existential risk - might impact risks from AI, is part 6 in our sequence on Modeling Transformative AI Risk. In this series of posts, we are presenting a preliminary model of the...

Nov 5, 2021•51

Modeling Risks From Learned Optimization

This post, which deals with how risks from learned optimization and inner alignment can be understood, is part 5 in our sequence on Modeling Transformative AI Risk. We are building a model to understand debates around existential risks from advanced AI. The model is made with Analytica software, and consists...

Oct 12, 2021•45

Do mesa-optimizer risk arguments rely on the train-test paradigm?

Going by the Risks from Learned Optimization sequence, it's not clear if mesa-optimization is a big threat if the model continues to be updated throughout deployment. I suspect this has been discussed before (links welcome), but I didn't find anything with a quick search. Lifelong/online/continual learning is popular and could...

Sep 10, 2020•12

Clarifying some key hypotheses in AI alignment

We've created a diagram mapping out important and controversial hypotheses for AI alignment. We hope that this will help researchers identify and more productively discuss their disagreements. Diagram A part of the diagram. Click through to see the full version. Caveats 1. This does not decompose arguments exhaustively. It does...

Aug 15, 2019•79