Abram Demski


Pointing at Normativity
Consequences of Logical Induction
Partial Agency
Alternate Alignment Ideas
Embedded Agency


Dutch-Booking CDT: Revised Argument

I thought about these things in writing this, but I'll have to think about them again before making a full reply.

We could modify the epsilon exploration assumption so that the agent also chooses between  and  even while its top choice is . That is, there's a lower bound on the probability with which the agent takes an action in , but even if that bound is achieved, the agent still has some flexibility in distributing probability between  and .

Another similar scenario would be: we assume the probability of an action is small if it's sub-optimal, but smaller the worse it is.

Dutch-Booking CDT: Revised Argument

I agree with this, but I was assuming the CDT agent doesn't think buying B will influence the later decision. This, again, seems plausible if the payoff is made sufficiently small. I believe that there are some other points in my proof which make similar assumptions, which would ideally be made clearer in a more formal write-up.

However, I think CDT advocates will not generally take this to be a sticking point. The structure of my argument is to take a pre-existing scenario, and then add bets. For my argument to work, the bets need to be "independent" of critical things (causally and/or evidentially independent) -- in the example you point out, the action taken later needs to be causally independent of the bet made earlier (more specifically, causal-conditioning on the bet should not change beliefs about what action will be taken).

This is actually very similar to traditional Dutch-book arguments, which treat the bets as totally independent of everything. I could argue that it's just part of the thought experiment; if you concede that there could be a scenario like that, then you concede that CDT gets dutch-booked.

If you don't buy that, but you do buy Dutch Books as a methodology more generally, then I think you have to claim there's some rule which forbids "situations like this" (so CDT has to think the bets are not independent of everything else, in such a way as to spoil my argument). I would be very interested if you could propose a sensible view like this. However, I think not: there doesn't seem to be anything about the scenario which violates some principle of causality or rationality. If you forbid scenarios like this, you seem to be forbidding a very reasonable scenario, for no good reason (other than to save CDT).

My Current Take on Counterfactuals

Now I feel like I should have phrased it more modestly, since it's really "settled modulo math working out", even though I feel fairly confident some version of the math should work out.

My Current Take on Counterfactuals

Is there a way to operationalize "respecting logic"? For example, a specific toy scenario where an infra-Bayesian agent would fail due to not respecting logic?

"Respect logic" means either (a) assigning probability one to tautologies (at least, to those which can be proved in some bounded proof-length, or something along those lines), or, (b) assigning probability zero to contradictions (again, modulo boundedness). These two properties should be basically equivalent (ie, imply each other) provided the proof system is consistent. If it's inconsistent, they imply different failure modes.

My contention isn't that infra-bayes could fail due to not respecting logic. Rather, it's simply not obvious whether/how it's possible to make an interesting troll bridge problem for something which doesn't respect logic. EG, the example I mentioned of a typical RL agent -- the obvious way to "translate" Troll Bridge to typical RL is for the troll to blow up the bridge if and only if the agent takes an exploration step. But, this isn't sufficiently like the original Troll Bridge problem to be very interesting.

By no means do I mean to indicate that there's an argument that agents have to "respect logic" buried somewhere in this write-up (or the original troll-bridge writeup, or my more recent explanation of troll bridge, or any other posts which I linked).

If I want to argue such a thing, I'd have to do so separately.

And, in fact, I don't think I want to argue that an agent is defective if it doesn't "respect logic". I don't think I can pull out a decision problem it'll do poorly on, or such.

I a little bit want to argue that a decision theory is less revealing if it doesn't represent an agent as respecting logic, because I tend to think logical reasoning is an important part of an agent's rationality. EG, a highly capable general-purpose RL agent should be interpretable as using logical reasoning internally, even if we can't see that in the RL algorithm which gave rise to it. (In which case you might want to ask how the RL agent avoids the troll-bridge problem, even though the RL algorithm itself doesn't seem to give rise to any interesting problem there.)

As such, I find it quite plausible that InfraBayes and other RL algorithms end up handling stuff like Troll Bridge just fine without giving us insight into the correct reasoning, because they eventually kick out any models/hypotheses which fail Troll Bridge.

Whether it's necessary to "gain insight" into how to solve Troll Bridge (as an agent which respects some logic internally), rather than merely solve it (by providing learning algorithms which have good guarantees), is separate question. I won't claim this has a high probability of being a necessary kind of insight (for alignment). I will claim it seems like a pretty important question to answer for someone interested in counterfactual reasoning.

True, but IMO the way to incorporate "radical probabilism" is via what I called Turing RL.

I don't think Turing RL addresses radical probabilism at all, although it plausibly addresses a major motivating force for being interested in radical probabilism, namely logical uncertainty.

From a radical-probabilist perspective, the complaint would be that Turing RL still uses the InfraBayesian update rule, which might not always be necessary to be rational (the same way Bayesian updates aren't always necessary).

Naively, it seems very possible to combine infraBayes with radical probabilism: 

  • Starting from radical probabilism, which is basically "a dynamic market for beliefs", infra seems close to the insight that prices can have a "spread". (In the same way that interval probability is close to InfraBayes, but not all the way).
  • Starting from Infra, the question is how to add in the market aspect.

However, I'm not sure what formalism could unify these.

Reflective Bayesianism

This post seemed to be praising the virtue of returning to the lower-assumption state. So I argued that in the example given, it took more than knocking out assumptions to get the benefit.

Agreed. Simple Bayes is the hero of the story in this post, but that's more because the simple bayesian can recognize that there's something beyond.

Phylactery Decision Theory

I'm using talk about control sometimes to describe what the agent is doing from the outside, but the hypothesis it believes all have a form like "The variables such and such will be as if they were set by BDT given such and such inputs".

Right, but then, are all other variables unchanged? Or are they influenced somehow? The obvious proposal is EDT -- assume influence goes with correlation. Another possible answer is "try all hypotheses about how things are influenced."

Phylactery Decision Theory

One problem with this is that it doesn't actually rank hypotheses by which is best (in expected utility terms), just how much control is implied. So it won't actually converge to the best self-fulfilling prophecy (which might involve less control).

Another problem with this is that it isn't clear how to form the hypothesis "I have control over X".

Reflective Bayesianism

I wanted to separate what work is done by radicalizing probabilism in general, vs logical induction specifically. 

From my perspective, Radical Probabilism is a gateway drug. Explaining logical induction intuitively is hard. Radical Probabilism is easier to explain and motivate. It gives reason to believe that there's something interesting in the direction. But, as I've stated before, I have trouble comprehending how Jeffrey correctly predicted that there's something interesting here, without logical uncertainty as a motivation. In hindsight, I feel his arguments make a great deal of sense; but without the reward of logical induction waiting at the end of the path, to me this seems like a weird path to decide to go down.

That said, we can try and figure out Jeffrey's perspective, or, possible perspectives Jeffrey could have had. One point is that he probably thought virtual evidence was extremely useful, and needed to get people to open up to the idea of non-bayesian updates for that reason. I think it's very possible that he understood his Radical Probabilism purely as a generalization of regular Bayesianism; he may not have recognized the arguments for convergence and other properties. Or, seeing those arguments, he may have replied "those arguments have a similar force for a dogmatic probabilist, too; they're just harder to satisfy in that case."

That said, I'm not sure logical inductors properly have beliefs about their own (in the de dicto sense) future beliefs. It doesn't know "its" source code (though it knows that such code is a possible program) or even that it is being run with the full intuitive meaning of that, so it has no way of doing that.

I totally agree that there's a philosophical problem here. I've put some thought into it. However, I don't see that it's a real obstacle to ... provisionally ... moving forward. Generally I think of the logical inductor as the well-defined mathematical entity and the self-referential beliefs are the logical statements which refer back to that mathematical entity (with all the pros and cons which come from logic -- ie, yes, I'm aware that even if we think of the logical inductor as the mathematical entity, rather than the physical implementation, there are formal-semantics questions of whether it's "really referring to itself"; but it seems quite fine to provisionally set those questions aside).

So, while I agree, I really don't think it's cruxy. 

Reflective Bayesianism

So, let's suppose for a moment that ZFC set theory is the one true foundation of mathematics, and it has a "standard model" that we can meaningfully point at, and the question is whether our universe is somewhere in the standard model (or, rather, "perfectly described" by some element of the standard model, whatever that means).

In this case it's easy to imagine that the universe is actually some structure not in the standard model (such as the standard model itself, or the truth predicate for ZFC; something along those lines).

Now, granted, the whole point of moving from some particular system like that to the more general hypothesis "the universe is mathematical" is to capture such cases. However, the notion of "mathematics in general" or "described by some formal system" or whatever is sufficiently murky that there could still be an analogous problem -- EG, suppose there's a formal system which describes the entire activity of human mathematics. Then "the real universe" could be some object outside the domain of that formal system, EG, the truth predicate for that formal system, the intended 'standard model' of that system, etc.

I'm not confident that we should think that way, but it's a salient possibility.

Reflective Bayesianism

What is actually left of Bayesianism after Radical Probabilism? Your original post on it was partially explaining logical induction, and introduced assumptions from that in much the same way as you describe here. But without that, there doesn't seem to be a whole lot there. The idea is that all that matters is resistance to dutch books, and for a dutch book to be fair the bookie must not have an epistemic advantage over the agent. Said that way, it depends on some notion of "what the agent could have known at the time", and giving a coherent account of this would require solving epistemology in general. So we avoid this problem by instead taking "what the agent actually knew (believed) at the time", which is a subset and so also fair. But this doesn't do any work, it just offloads it to agent design. 

Part of the problem is that I avoided getting too technical in Radical Probabilism, so I bounced back and forth between different possible versions of Radical Probabilism without too much signposting.

I can distinguish at least three versions:

  1. Jeffrey's version. I don't have a good source for his full picture. I get the sense that the answer to "what is left?" is "very little!" -- EG, he didn't think agents have to be able to articulate probabilities. But I am not sure of the details.
  2. The simplification of Jeffrey's version, where I keep the Kolmogorov axioms (or the Jeffrey-Bolker axioms) but reject Bayesian updates.
  3. Skyrms' deliberation dynamics. This is a pretty cool framework and I recommend checking it out (perhaps via his book The Dynamics of Rational Deliberation). The basic idea of its non-bayesian updates is, it's fine so long as you're "improving" (moving towards something good).
  4. The version represented by logical induction.
  5. The Shafer & Vovk version. I'm not really familiar with this version, but I hear it's pretty good.

(I can think of more, but I cut myself off.)

Said that way, it depends on some notion of "what the agent could have known at the time", and giving a coherent account of this would require solving epistemology in general. 

Making a broad generalization, I'm going to stick things into camp #2 above or camp #4. Theories in camp #2 have the feature that they simply assume a solid notion of "what the agent could have known at the time". This allows for a nice simple picture in which we can check Dutch Book arguments. However, it does lend itself more easily to logical omniscience, since it doesn't allow a nuanced picture of how much logical information the agent can generate. Camp #4 means we do give such a nuanced picture, such as the poly-time assumption.

Either way, we've made assumptions which tell us which Dutch Books are valid. We can then check what follows.

For example with logical induction, we know that it can't be dutch booked by any polynomial-time trader. Why do we think that criterion is important? Because we think its realistic for an agent to in the limit know anything you can figure out in polynomial time. And we think that because we have an algorithm that does it. Ok, but what intellectual progress does the dutch book argument make here? We had to first find out what one can realistically know, and got logical induction, from which we could make the poly-time criterion. So now we know its fair to judge agents by that criterion, so we should find one, which fortunately we already have. But we could also just not have thought about dutch books at all, and just tried to figure out what one could realistically know, and what would we have lost? Making the dutch book here seems like a spandrel in thinking style.

I think this understates the importance of the Dutch-book idea to the actual construction of the logical induction algorithm. The criterion came first, and the construction was finished soon after. So the hard part was the criterion (which is conceived in dutch-book terms). And then the construction follows nicely from the idea of avoiding these dutch-books.

Plus, logical induction without the criterion would be much less interesting. The criterion implies all sorts of nice properties. Without the criterion, we could point to all the nice properties the logical induction algorithm has, but it would just be a disorganized mess of properties. Someone would be right to ask if there's an underlying reason for all these nice properties -- an organizing principle, rather than just a list of seemingly nice properties. The answer to that question would be "dutch books".

BTW, I believe philosophers currently look down on dutch books for being too pragmatic/adversarial a justification, and favor newer approaches which justify epistemics from a plain desire to be correct rather than a desire to not be exploitable. So by no means should we assume that Dutch Books are the only way. However, I personally feel that logical induction is strong evidence that Dutch Books are an important organizing principle.

As a side note, I reread Radical Probabilism for this, and everything in the "Other Rationality Properties" section seems pretty shaky to me. Both the proofs of both convergence and calibration as written depend on logical induction - or else, the assumption that the agent would know if its not convergent/calibrated, in which case could orthodoxy not achieve the same? You acknowledge this for convergence in a comment but also hint at another proof. But if radical probabilism is a generalization of orthodox bayesianism, then how can it have guarantees that the latter doesn't?

You're right to call out the contradiction between calling radical probabilism a generalization, vs claiming that it implies new restrictions. I should have been more consistent about that. Radical Probabilism is merely "mostly a generalization". 

I still haven't learned about how #2-style settings deal with calibration and convergence, so I can't really comment on the other proofs I implied the existence of. But, yeah, it means there are extra rationality conditions beyond just the Kolmogorov axioms.

For the conservation of expected evidence, note that the proof here involves a bet on what the agents future beliefs will be. This is a fragile construction: you need to make sure the agent can't troll the bookie, without assuming the accessability of the structures you want to establish. It also assumes the agent has models of itself in its hypothesis space. And even in the weaker forms, the result seems unrealistic. There is the problem with psychedelics that the "virtuous epistemic process" is supposed to address, but this is something that the formalism allows for with a free parameter, not something it solves. The radical probabilist trusts the sequence of , but it doesn't say anything about where they come from. You can now assert that it can't be identified with particular physical processes, but that just leaves a big questionmark for bridging laws. If you want to check if there are dutch books against your virtuous epistemic process, you have to be able to identify its future members. Now I can't exclude that some process could avoid all dutch books against it without knowing where they are (and without being some trivial stupidity), but it seems like a pretty heavy demand.

This part seems entirely addressed by logical induction, to me.

  1. A "virtuous epistemic process" is a logical inductor. We know logical inductors come to trust their future opinions (without knowing specifically what they will be). 
  2. The logical induction algorithm tells us where the future beliefs come from.
  3. The logical induction algorithm shows how to have models of yourself.
  4. The logical induction algorithm shows how to avoid all dutch books "without knowing where they are" (actually I don't know what you meant by this)
Load More