Abram Demski


Consequences of Logical Induction
Partial Agency
Alternate Alignment Ideas
Embedded Agency


Applying the Counterfactual Prisoner's Dilemma to Logical Uncertainty

Notice however that for Logical Counterfactual Mugging to be well defined, you need to define what Omega is doing when it is making its prediction. In Counterfactuals for Perfect Predictors, I explained that when dealing with perfect predictors, often the counterfactual would be undefined. For example, in Parfit's Hitchhicker a perfect predictor would never give a lift to someone who never pays in town, so it isn't immediately clear that predicting what such a person would do in town involves predicting something coherent. 

Another approach is to change the example to remove the objection.

The poker-like game at the end of Decision Theory (I really titled that post simply "decision theory"? rather vague, past-me...) is isomorphic to counterfactual mugging, but removes some distractions, such as "how does Omega take the counterfactual".

Alice receives a High or Low card. Alice can reveal the card to Bob. Bob then states a probability  for Alice's card being Low. Bob's incentives just encourage him to report honest beliefs. Alice loses .

When Alice gets a Low card, she can just reveal it to Bob, and get the best possible outcome. But this strategy means Bob will know if she has a high card, giving her the worst possible outcome in that case. In order to successfully bluff, Alice has to sometimes act like she has different cards than she has. And indeed, the optimal strategy for Alice in this case is to never show her cards.

This example will get less objections from people, because it is grounded in a very realistic game. Playing poker well requires this kind of reasoning. The powerful predictor is replaced with another player. We can still technically ask "how is the other player deal with undefined counterfactuals?", but we can skip over that by just reasoning about strategies in the usual game-theoretic way -- if Alice's strategy were to reveal low cards, then Bob could always call high cards.

We can then insert logical uncertainty by stipulating that Alice gets her card pseudorandomly, but neither Alice nor Bob can predict the random number generator.

Not sure yet whether you can pull a similar trick with Counterfactual Prisoner's Dilemma. 


== nitpicks ==

Applying the Counterfactual Prisoner's Dilemma to Logical Uncertainty

Why isn't the title "applying logical uncertainty to the counterfactual prisoner's dilemma"? Or "A Logically Uncertain Version of Counterfactual Prisoner's Dilemma"? I don't see how you're applying CPD to LU.

The Counterfactual Prisoner's Dilemma is a symmetric version of the original 

Symmetric? The original is already symmetric. But "symmetric" is a concept which applies to multi-player games. Counterfactual PD makes PD into a one-player game. Presumably you meant "a one-player version"?

where regardless of whether the coin comes up heads or tails you are asked to pay $100 and you are then paid $10,000 if Omega predicts that you would have paid if the coin had come up the other way. If you decide updatelesly you will always received $9900, while if you decide updatefully, then you will receive $0. 

This is only true if you use classical CDT, yeah? Whereas EDT can get $9900 in both cases, provided it believes in a sufficient correlation between what it does upon seeing heads vs tails.

So unlike Counterfactual Mugging, pre-committing to pay ensures a better outcome regardless of how the coin flip turns out, suggesting that focusing only on your particular probability branch is mistaken.

I don't get what you meant by the last part of this sentence. Counterfactual Mugging already suggests that focusing only on your particular branch is mistaken. If someone bought that you should pay up in this problem but not in counterfactual mugging, I expect that person to say something like "because in this case that strategy is guaranteed better even in this branch" -- hence, they're not necessarily convinced to look at other branches. So I don't think this example necessarily argues for looking at other branches.

Also, why is this posted as a question?

Comparing Utilities

I agree that this can create perverse incentives in practice, but that seems like the sort of thing that you should be handling as part of your decision theory, not your utility function.

I'm mainly worried about the perverse incentives part.

I recognize that there's some weird level-crossing going on here, where I'm doing something like mixing up the decision theory and the utility function. But it seems to me like that's just a reflection of the weird muddy place our values come from?

You can think of humans a little like self-modifying AIs, but where the modification took place over evolutionary history. The utility function which we eventually arrived at was (sort of) the result of a bargaining process between everyone, and which took some accounting of things like exploitability concerns.

In terms of decision theory, I often think in terms of a generalized NicerBot: extend everyone else the same cofrence-coefficient they extend to you, plus an epsilon (to ensure that two generalized NicerBots end up fully cooperating with each other). This is a pretty decent strategy for any game, generalizing from one of the best strategies for Prisoner's Dilemma. (Of course there is no "best strategy" in an objective sense.)

But a decision theory like that does mix levels between the decision theory and the utility function!

I feel like the solution of having cofrences not count the other person's cofrences just doesn't respect people's preferences—when I care about the preferences of somebody else, that includes caring about the preferences of the people they care about.

I totally agree with this point; I just don't know how to balance it against the other point.

A crux for me is the coalition metaphor for utilitarianism. I think of utilitarianism as sort of a natural endpoint of forming beneficial coalitions, where you've built a coalition of all life.

If we imagine forming a coalition incrementally, and imagine that the coalition simply averages utility functions with its new members, then there's an incentive to join the coalition as late as you can, so that your preferences get the largest possible representation. (I know this isn't the same problem we're talking about, but I see it as analogous, and so a point in favor of worrying about this sort of thing.)

We can correct that by doing 1/n averaging: every time the coalition gains members, we make a fresh average of all member utility functions (using some utility-function normalization, of course), and everybody voluntarily self-modifies to have the new mixed utility function.

But the problem with this is, we end up punishing agents for self-modifying to care about us before joining. (This is more closely analogous to the problem we're discussing.) If they've already self-modified to care about us more before joining, then their original values just get washed out even more when we re-average everyone.

So really, the implicit assumption I'm making is that there's an agent "before" altruism, who "chose" to add in everyone's utility functions. I'm trying to set up the rules to be fair to that agent, in an effort to reward agents for making "the altruistic leap".

Comparing Utilities

The problem in your example is that you failed to identify a reasonable disagreement point.

Ahh, yeahh, that's a good point.

Comparing Utilities

Yeah, I like your "consensus spoiler". Maybe needs a better name, though... "Contrarian Monster"?

having a coference of -1 for everyone.

This way of defining the Consensus Spoiler seems needlessly assumption-heavy, since it assumes not only that we can already compare utilities in order to define this perfect antagonism, but furthermore that we've decided how to deal with cofrences.

A similar option with a little less baggage is to define it as having the opposite of the preferences of our social choice function. They just hate whatever we end up choosing to represent the group's preferences.

A simpler option is just to define the Contrarian Monster as having opposite preferences from one particular member of the collective. (Any member will do.) This ensures that there can be no Pareto improvements.

If you have a community of 100 agents that would agree to pick some states over others and construct a new comunity of 101 with the consensus spoiler then they can't form any choice function.

Actually, the conclusion is that you can form any social choice function. Everything is "Pareto optimal".

The question whether it is warranted, allowed or forbidden that the coalition of 100 just proceeds with the policy choice that screws the spoiler over doesn't seem to be a mathematical kind of claim.

If we think of it as bargaining to form a coalition, then there's never any reason to include the Spoiler in a coalition (especially if you use the "opposite of whatever the coalition wants" version). In fact, there is a version of Harsanyi's theorem which allows for negative weights, to allow for this -- giving an ingroup/outgroup sort of thing. Usually this isn't considered very seriously for definitions of utilitarianism. But it could be necessary in extreme cases.

(Although putting zero weight on it seems sufficient, really.)

And even in the less extreme degree I don't get how you could use this setup to judge values that are in conflict.And if you encounter a unknown agent it seems it is ambigious whether you should take heed of its values in compromise or just treat it as a possible enemy and just adhere to your personal choices.

Pareto-optimality doesn't really give you the tools to mediate conflicts, it's just an extremely weak condition on how you do so, which says essentially that we shouldn't put negative weight on anyone.

Granted, the Consensus Spoiler is an argument that Pareto-optimality may not be weak enough, in extreme situations.

Troll Bridge

If here instead means "discoverable by the agent's proof search" or something to that effect, then the logic here seems to follow through (making the reasonable assumption that if the agent can discover a proof for A=cross->U=-10, then it will set its expected value for crossing to -10).

Right, this is what you have to do.

However, that would mean we are talking about provability in a system which can only prove finitely many things, which in particular cannot contain PA and so Löb's theorem does not apply.

Hmm. So, a bounded theorem prover using PA can still prove Löb about itself. I think everything is more complicated and you need to make some assumptions (because there's no guarantee a bounded proof search will find the right Löbian proof to apply to itself, in general), but you can make it go through.

I believe the technical details you're looking for will be in Critch's paper on bounded Löb.

Comparing Utilities

Weird, given that they still look fine for me!

I'll try to fix...

Comparing Utilities

Yeah, it seems like in practice humans should be a lot more comparable than theoretical agentic entities like I discuss in the post.

Radical Probabilism

Yeah, I don't think this can be generalized to model a radical probabilist in general, but it does seem like a relevant example of "extra-bayesian" (but not totally non-bayesian) calculations which can be performed to supplement Bayesian updates in practice.

Radical Probabilism

I do not understand how Jeffrey updates lead to path dependence. Is the trick that my probabilities can change without evidence, therefore I can just update B without observing anything that also updates A, and then use that for hocus pocus? Writing that out, I think that's probably it, but as I was reading the essay I wasn't sure which bit was where the key step was happening.

hmmmm. My attempt at an English translation of my example:

A and B are correlated, so moving B to 60% (up from 50%) makes A more probable as well. But then moving A up to 60% is less of a move for A. This means that (A&¬B) ends up smaller than (B&¬A): both get dragged up and then down, but (B&¬A) was dragged up by the larger update and down by the smaller.

Okay, I got tired and skipped most of the virtual evidence section (it got tough for me). You say "Exchange Virtual Evidence" and I would be interested in a concrete example of what that kind of conversation would look like. 

It would be nice to write a whole post on this, but the first thing you need to do is distinguish between likelihoods and probabilities.

The notation may look pointless at first. The main usage has to do with the way we usually regard the first argument as variable an the second as fixed. IE, "a probability function sums to one" can be understood as P(A|B)+P(¬A|B)=1; we more readily think of A as variable here. In a Bayesian update, we vary the hypothesis, not the evidence, so it's more natural to think in terms of a likelihood function, L(H|E).

In a Bayesian network, you propagate probability functions down links, and likelihood functions up links. Hence Pearl distinguished between the two strongly.

Likelihood functions don't sum to 1. Think of them as fragments of belief which aren't meaningful on their own until they're combined with a probability.

Base-rate neglect can be thought of as confusion of likelihood for probability. The conjunction fallacy could also be explained in this way.

I wish it were feasible to get people to use "likely" vs "probable" in this way. Sadly, that's unprobable to work.

I'm imagining it's something like "I thought for ages and changed my mind, let me tell you why".

What I'm pointing at is really much more outside-view than that. Standard warnings about outside view apply. ;p 

An example of exchanging probabilities is: I assert X, and another person agrees. I now know that they assign a high probability to X. But that does not tell me very much about how to update.

Exchanging likelihoods instead: I assert X, and the other person tells me they already thought that for unrelated reasons. This tells me that their agreement is further evidence for X, and I should update up.

Or, a different possibility: I assert X, and the other person updates to X, and tells me so. This doesn't provide me with further evidence in favor of X, except insofar as they acted as a proof-checker for my argument.

"Exchange virtual evidence" just means "communicate likelihoods" (or just likelihood ratios!)

Exchanging likelihoods is better than exchanging probabilities, because likelihoods are much easier to update on.

Granted, exchanging models is much better than either of those two ;3 However, it's not always feasible. There's the quick conversational examples like I gave, where someone may just want to express their epistemic state wrt what you just said in a way which doesn't interrupt the flow of conversation significantly. But we could also be in a position where we're trying to integrate many expert opinions in a forecasting-like setting. If we can't build a coherent model to fit all the information together, virtual evidence is probable to be one of the more practical and effective ways to go.

Load More