AI ALIGNMENT FORUM
AF

Wikitags
Main
LW Wiki

Moral uncertainty

Edited by Eliezer Yudkowsky, et al. last updated 19th Feb 2025

"Moral uncertainty" in the context of AI refers to an agent with an "uncertain utility function". That is, we can view the agent as pursuing a that takes on different values in different subsets of possible worlds.

For example, an agent might have a saying that eating cake has a utility of €8 in worlds where Lee Harvey Oswald shot John F. Kennedy and that eating cake has a utility of €10 in worlds where it was the other way around. This agent will be motivated to inquire into political history to find out which utility function is probably the 'correct' one (relative to this meta-utility function), though it will never be .

Moral uncertainty must be resolvable by some conceivable observation in order to function as uncertainty. Suppose for example that an agent's probability distribution ΔU over the 'true' utility function U asserts a dependency on a fair quantum coin that was flipped inside a sealed box then destroyed by explosives: the utility function is U1 over outcomes in the worlds where the coin came up heads, and if the coin came up tails the utility function is U2. If the agent thinks it has no way of ever figuring out what happened inside the box, it will thereafter behave as if it had a single, constant, certain utility function equal to 0.5⋅U1+0.5⋅U2.

Parents:
Children:
Subscribe
Subscribe
meta-utility function
Ideal target
Discussion0
Discussion0
utility function
Preference framework
absolutely sure
Posts tagged Moral uncertainty
24Normativity
Abram Demski
5y
9
402018 AI Alignment Literature Review and Charity Comparison
Larks
7y
4
392019 AI Alignment Literature Review and Charity Comparison
Larks
6y
8
24AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah
Lucas Perry
5y
21
11AXRP Episode 3 - Negotiable Reinforcement Learning with Andrew Critch
DanielFilan
5y
0
6For the past, in some ways only, we are moral degenerates
Stuart Armstrong
6y
8
13Updated Deference is not a strong argument against the utility uncertainty approach to alignment
Ivan Vendrov
3y
2
8Morally underdefined situations can be deadly
Stuart Armstrong
4y
0
4RFC: Meta-ethical uncertainty in AGI alignment
Gordon Seidoh Worley
7y
0
Add Posts