Noa Nabeshima


Clarifying some key hypotheses in AI alignment

What software did you use to produce this diagram?

What considerations influence whether I have more influence over short or long timelines?

How much influence and ability you expect to have as an individual in that timeline.

For example, I don't expect to have much influence/ability in extremely short timelines, so I should focus on timelines longer than 4 years, with more weight to longer timelines and some tapering off starting around when I expect to die.

How relevant thoughts and planning now will be.

If timelines are late in my life or after my death, thoughts, research, and planning now will be much less relevant to the trajectory of AI going well, so at this moment in time I should weight timelines in the 4-25 year range more.

Poll: Which variables are most strategically relevant?

Value-symmetry: "Will AI systems in the critical period be equally useful for different values?"

This could fail if, for example, we can build AI systems that are very good at optimizing for easy-to-measure values but significantly worse at optimizing for hard to measure values. It might be easy to build a sovereign AI to maximize the profit of a company, but hard to create one that cares about humans and what they want.

Evan Hubinger has some operationalizations of things like this here and  here.

Poll: Which variables are most strategically relevant?

Open / Closed: "Will transformative AI systems in the critical period be publicly available?"

A world where everyone has access to transformative AI systems, for example by being able to rent them (like GPT-3's API once it's publicly available), might be very different from one where they are kept private by one or more private organizations.

For example, if strategy stealing doesn't hold, this could dramatically change the distribution of power, because the systems might be more helpful for some tasks and values than others.

This variable could also affect timelines estimates if publicly accessible TAI systems increase GDP growth, among other effects it could have on the world.

Poll: Which variables are most strategically relevant?

Deceptive alignment: “In the critical period, will AIs be deceptive?”

Within the framework of Risks from Learned Optimization, this is when a mesa-optimizer has a different objective than the base objective, but instrumentally optimizes the base objective to deceive humans. It can refer more generally to any scenario where an AI system behaves instrumentally one way to deceive humans.

Poll: Which variables are most strategically relevant?

Alignment tax: “How much more difficult will it be to create an aligned AI vs an unaligned AI when it becomes possible to create powerful AI?”

If the alignment tax is low, people have less incentive to build an unaligned AI as they'd prefer to build a system that's trying to do what they want. Then, to increase the probability that our AI trajectory goes well, one could focus on how to reduce the alignment tax.