Thoughts from a Two Boxer

One of my biggest open questions in decision theory is where this line between fair and unfair problems should lie.

I think the current piece that points at this question most directly is Success-First Decision Theories by Preston Greene.

At this point I am not convinced any problem where agents in the environment have access to our decision theory's source code or copies of our agent are fair problems. But my impression from hearing and reading what people talk about is that this is a heretical position.

It seems somewhat likely to me that agents will be reasoning about each other using access to source code fairly soon (if just human operators evaluating whether or not to run intelligent programs, or what inputs to give to those programs). So then the question is something like: "what's the point of declaring a problem unfair?", to which the main answer seems to be "to spend limited no free lunch points." If I perform poorly on worlds that don't exist in order to perform better on worlds that do exist, that's a profitable trade.

Which leads to this:

I disagree with this view and see Newcomb's problem as punishing rational agents.

...

My big complaint with mind reading is that there just isn't any mind reading.

One thing that seems important (for decision theories implemented by humans or embedded agents, as distinct from decision theories implemented by Cartesian agents) is whether or not the decision theory is robust to ignorance / black swans. That is, if you bake into your view of the world that mind reading is impossible, then you can be durably exploited by any actual mind reading (whereas having some sort of ontological update process or low probability on bizarre occurrences allows you to only be exploited a finite number of times).

But note the connection to the earlier bit--if something is actually impossible, then it feels costless to give up on it in order to perform better in the other worlds. (My personal resolution to counterfactual mugging, for example, seems to rest on an underlying belief that it's free to write off logically inconsistent worlds, in a way that it's not free to write off factually inconsistent worlds that could have been factually consistent / are factually consistent in a different part of the multiverse.)

[-]jaek6y20

Thanks for your detailed reply! I'll look into that reference.

[-]Adele Lopez6y30

I think people make decisions based on accurate models of other people all the time. I think of Newcomb's problem as the limiting case where Omega has extremely accurate predictions, but that the solution is still relevant even when "Omega" is only 60% likely to guess correctly. A fun illustration of a computer program capable of predicting (most) humans this accurately is the Aaronson oracle.

[-]cousin_it6y30

Well, there are other problems besides Newcomb. Something like UDT can be motivated by simulations, or amnesia, or just multiple copies of the AI trying to cooperate with each other. All these lead to pretty much the same theory, that's why it's worth thinking about.

[-]jaek6y20

Thanks for your comment. I'll look into those other problems.

[-]LukasM6y20

I'm going to post this anyway since its blog-day and not important-quality-writing day but I'm not sure this blog has much of a purpose anymore.

I liked the characterization of decision theory and the comment that the problem naively seems trivial from this perspective. Also liked the description of Newcomb's problem as a version of the prisoners dilemma. So it totally had a purpose!

[-]LukasM6y20

I have already stated I see the third bullet as an unfair problem.

Should this be "the first bullet"?

[-]jaek6y10

Fixed

[-]Joar Skalse6y20

As you may know, CDT has a lot of fans in academia. It might be interesting to consider what they have to say about Newcomb's Problem (and other supposed counter-examples to CDT).

In "The Foundations of Causal Decision Theory", James Joyce argues that Newcomb's Problem is "unfair" on the grounds that it treats EDT and CDT agents differently. An EDT agent is given two good choices ($1,000,000 and $1,001,000) whereas a CDT agent is given two bad choices ($0 and $1,000). If you wanted to represent Newcomb's Problem as a Markov Decision Process then you would have to put EDT and CDT agents in different MDPs. Lo and behold, the EDT agent gets more money, but this is (according to Joyce) just because it is given an unfair advantage. Hence Newcomb's Problem isn't really too different from the obviously unfair "decision" problem you gave above, the unfairness is just obfuscated. The fact that EDT outperforms CDT in a situation in which EDT agents are unconditionally given more money than CDT agents is not an interesting objection to CDT, and so Newcomb's Problem is not an interesting objection to CDT (according to Joyce).

It might be worth thinking about this argument. Note that this argument operates at the level of individual decision problems, and doesn't say anything about whether its worth taking into account the possibility that different sorts of agents might tend end up in different sorts of situations. It also presumes a particular way of answering the question of whether two decision problems are "the same" problem.

I also want to note that you don't need perfect predictors, or anything even close to that, to create Newcomblike situations. Even if the Predictor's accuracy is only somewhat better than a coin flip this is sufficient to make the causal expected utility different from the evidential expected utility. The key property is that which action you take constitutes evidence about the state of the environment, which can happen in many ways.

This may read like I'm already explicitly guided by the false purpose Wei Dai warned against. My understanding is that the goal is to understand ideal decision making. Just not for the purposes of implementation. ↩︎
I don't really know anything but I imagine the game theory of reputation is well developed ↩︎

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

9

9

The Problem of Decision Theory

Newcomb's Problem

Mind Reading isn't Cool

Conclusion