AI Benefits Post 2: How AI Benefits Differs from AI Alignment & AI for Good

Cullen

This is a post in a series on "AI Benefits." It is cross-posted from my personal blog. For other entries in this series, navigate to the AI Benefits Blog Series Index page.

This post is also discussed on the Effective Altruism Forum.

For comments on this series, I am thankful to Katya Klinova, Max Ghenis, Avital Balwit, Joel Becker, Anton Korinek, and others. Errors are my own.

If you are an expert in a relevant area and would like to help me further explore this topic, please contact me.

How AI Benefits Differs from AI Alignment & AI for Good

The Values Served by AI Benefits Work

Benefits plans need to optimize for a number of objectives.^[1] The foremost is simply maximizing wellbeing. But AI Benefits work has some secondary goals, too. Some of these include:

Equality: Benefits are distributed fairly and broadly.^[2]
Autonomy: AI Benefits respect and enhance end-beneficiaries’ autonomy.^[3]
Democratization: Where possible, AI Benefits decisionmakers should create, consult with, or defer to democratic governance mechanisms.
Modesty: AI benefactors should be epistemically modest, meaning that they should be very careful when predicting how plans will change or interact with complex systems (e.g., the world economy).

These secondary goals are largely inherited from the stated goals of many individuals and organizations working to produce AI Benefits.

Additionally, since the rate of improvements to wellbeing probably decreases with income, the focus on maximizing wellbeing implies a focus on the distributional aspects of Benefits.

How AI Benefits differs from AI Alignment

Another important clarification is that AI Benefits differ from AI Alignment.

Both alignment and beneficiality are ethically relevant concepts. Alignment can refer to several different things. Iason Gabriel of DeepMind provides a useful taxonomy of existing conceptions of alignment. According to Gabriel, “AI alignment” can refer to alignment with:

“Instructions: the agent does what I instruct it to do.”
“Expressed intentions: the agent does what I intend it to do.”
“Revealed preferences: the agent does what my behaviour reveals I prefer.”
“Informed preferences or desires: the agent does what I would want it to do if I were rational and informed.”
“Interest or well-being: the agent does what is in my interest, or what is best for me, objectively speaking.”
“Values: the agent does what it morally ought to do . . . .”

A system can be aligned in most of these senses without being beneficial. Being beneficial is distinct from being aligned in senses 1–4 because those deal only with the desires of a particular human principal, which may or may not be beneficial. Being beneficial is distinct from conception 5 because beneficial AI aims to benefit many or all moral patients. Only AI that is aligned in the sixth sense would be beneficial by definition. Conversely, AI need not be well-aligned to be beneficial (though it might help).

How AI Benefits differs from AI for Good

A huge number of projects exist under the banner of “AI for Good.” These projects are generally beneficial. However, AI Benefits work is different from simply finding and pursuing an AI for Good project.

AI Benefits work aims at helping AI labs craft a long-term Benefits strategy. Unlike AI for Good, which is tied to specific techniques/capabilities (e.g., NLP) in certain domains (e.g., AI in education), AI Benefits is capability- and domain-agnostic. Accordingly, the pace of AI capabilities development should not dramatically alter AI Benefits plans at the highest level (though it may of course change how they are implemented). Most of my work therefore focuses not on concrete beneficial AI applications themselves, but rather on the process of choosing between and improving possible beneficial applications. This meta-level focus is particularly useful at OpenAI, where the primary mission is to benefit the world by building AGI—a technology with difficult-to-foresee capabilities.

Multi-objective optimization is a very hard problem. Managing this optimization problem both formally and procedurally is a key desideratum for Benefits plans. I do not think I have come close to solving this problem, and would love input on this point. ↩︎
OpenAI’s Charter commits “to us[ing] any influence we obtain over AGI’s deployment to ensure it is used for the benefit of all . . . .” ↩︎
OpenAI’s Charter commits to avoiding “unduly concentrat[ing] power.” ↩︎

A system can be aligned in most of these senses without being beneficial. Being beneficial is distinct from being aligned in senses 1–4 because those deal only with the desires of a particular human principal, which may or may not be beneficial. Being beneficial is distinct from conception 5 because beneficial AI aims to benefit many or all moral patients. Only AI that is aligned in the sixth sense would be beneficial by definition. Conversely, AI need not be well-aligned to be beneficial (though it might help).

At some point, some particular group of humans code the AI and press run. If all the people who coded it were totally evil, they will make an AI that does evil things.

The only place any kind of morality can affect the AI's decisions is if the programmers are somewhat moral. Whether the programmers hard-code their morality, or meta-morality or code the AI to do what they ask, then ask for moral things is an implementation detail. The key causal link is from the programmers preferences (including very abstract meta-preferences for fairness ect) to the AI's actions.

Equality: Benefits are distributed fairly and broadly.[2]

This sounds, at best like a consequence of the fact that human utility functions are sub linear in resources.

Democratization: Where possible, AI Benefits decisionmakers should create, consult with, or defer to democratic governance mechanisms.

Are we talking about decision making in a pre or post superhuman AI setting? In a pre ASI setting, it is reasonable for the people building AI systems to defer somewhat to democratic governance mechanisms, where their demands are well considered and sensible. (At least some democratic leaders may be sufficiently lacking in technical understanding of AI for their requests to be impossible, nonsensical or dangerous.)

In a post ASI setting, you have an AI capable of tracking every neuron firing in every human brain. It knows exactly what everyone wants. Any decisions made by democratic processes will be purely entropic compared to the AI. Just because democracy is better than dictatorship, monarchy ect doesn't mean we can attach positive affect to democracy and keep democracy around in the face of far better systems like benevolent super-intelligence running everything.

Modesty: AI benefactors should be epistemically modest, meaning that they should be very careful when predicting how plans will change or interact with complex systems (e.g., the world economy).

Again, pre ASI, this is sensible. I would expect an ASI to be very well calibrated. It will not need to be hard coded with modesty, it can work out how modest to be by its self.

A system can be aligned in most of these senses without being beneficial. Being beneficial is distinct from being aligned in senses 1–4 because those deal only with the desires of a particular human principal, which may or may not be beneficial. Being beneficial is distinct from conception 5 because beneficial AI aims to benefit many or all moral patients. Only AI that is aligned in the sixth sense would be beneficial by definition. Conversely, AI need not be well-aligned to be beneficial (though it might help).