Abram Demski


Partial Agency

Alternate Alignment Ideas


Embedded Agency


How "honest" is GPT-3?

Thanks for running the test, much appreciated! (Also, hilarious.)

Radical Probabilism [Transcript]

Understandable questions. I hope to expand this talk into a post which will explain things more properly.

Think of the two requirements for Bayes updates as forming a 2x2 matrix. If you have both (1) all information you learned can be summarised into one proposition which you learn with 100% confidence, and (2) you know ahead of time how you would respond to that information, then you must perform a Bayesian update. If you have (2) but not (1), ie you update some X to less than 100% confidence but you knew ahead of time how you would update to changed beliefs about X, then you are required to do a Jeffrey update. But if you don't have (2), updates are not very constrained by Dutch-book type rationality. So in general, Jeffrey argued that there are many valid updates beyond Bayes and Jeffrey updates.

Jeffrey updates are a simple generalization of Bayes updates. When a Bayesian learns X, they update it to 100%, and take P(Y|X) to be the new P(Y) for all Y. (More formally, we want to update P to get a new probability measure Q. We do so by setting Q(Y)=P(Y|X) for all Y.) Jeffrey wanted to handle the case where you somehow become 90% confident of X, instead of fully confident. He thought this was more true to human experience. A Jeffrey update is just the weighted average of the two possible Bayesian updates. (More formally, we want to update P to get Q where Q(X)=c for some chosen c. We set Q(Y) = cP(Y|X) + (1-c)P(Y|~X).)

A natural response for a classical Bayesian is: where does 90% come from? (Where does c come from?) But the Radical Probabilism retort is: where do observations come from? The Bayesian already works in a framework where information comes in from "outside" somehow. The radical probabilist is just working in a more general framework where more general types of evidence can come in from outside.

Pearl argued against this practice in his book introducing Bayesian networks. But he introduced an equivalent -- but more practical -- concept which he calls virtual evidence. The Bayesian intuition freaks out at somehow updating X to 90% without any explanation. But the virtual evidence version is much more intuitive. (Look it up; I think you'll like it better.) I don't think virtual evidence goes against the spirit of Radical Probabilism at all, and in fact if you look at Jeffrey's writing he appears to embrace it. So I hope to give that version in my forthcoming post, and explain why it's nicer than Jeffrey updates in practice.

Dutch-Booking CDT: Revised Argument

Oh right, OK. That's because of the general assumption that rational agents bet according to their beliefs. If a CDT agent doesn't think of a bet as intervening on a situation, then when betting ahead of time, it'll just bet according to its probabilities. But during the decision, it is using the modified (interventional) probabilities. That's how CDT makes decisions. So any bets which have to be made simultaneously, as part of the decision, will be evaluated according to those modified beliefs.

Relating HCH and Logical Induction
HCH is about deliberation, and logical inductors are about trial and error.

I think that's true of the way I describe the relationship in the OP, but not quite true in reality. I think there's also an aspect of deliberation that's present in logical induction and not in HCH. If we think of HCH as a snapshot of a logical inductor, the logical inductor is "improving over time as a result of thinking longer". This is partly due to trial-and-error, but there's also a deliberative aspect to it.

I mean, partly what I'm saying is that it's hard to draw a sharp line between deliberation and trial-and-error. If you try to draw that line such that logical induction lands to one side, you're putting Bayes' Law on multiple hypotheses on the "trial-and-error" side. But it's obvious that one would want it to be on both sides. It's definitely sort of about trial-and-error, but we also definitely want to apply Bayes' Law in deliberation. Similarly, it might turn out that we want to apply the more general logical-induction updates within deliberation.

But part of what I'm saying is that LIC (logical induction criterion) is a theory of rational deliberation in the sense of revising beliefs over time. The LIA (logical induction algorithm) captures the trial-and-error aspect, running lots of programs without knowing which ones are actually needed to satisfy LIC. But the LIC is a normative theory of deliberation, saying that what it means for belief revisions over time to be rational is that they not be too exploitable.

The cost is that it doesn't optimize what you want (unless what you want is the logical induction criterion) and that it will generally get taken over by consequentialists who can exercise malicious influence a constant number of times before the asymptotics assert themselves.

Yeah, if you take the LIA as a design proposal, it's pretty unhelpful. But if you take the LIC as a model of rational deliberation, you get potentially useful ideas.

The benefit of deliberation is that its preferences are potentially specified indirectly by the original deliberator (rather than externally by the criterion for trial and error), and that if the original deliberator is strong enough they may suppress internal selection pressures.

For example, the LIC is a context in which we can formally establish a version of "if the deliberator is strong enough they can suppress internal selection pressures".

Dutch-Booking CDT: Revised Argument

I guess here you wanted to say something interesting about free will, but it was probably lost from the draft to the final version of the post.

Ah whoops. Fixed.

I think developing this two points would be useful to readers since, usually, the pivotal concepts behind EDT and CDT are considered to be “conditional probabilities” and “(physical) causation” respectively, while here you seem to point at something different about the times at which decisions are made.

I'm not sure what you mean here. The "two different times" are (1) just before CDT makes the decision, and (2) right when CDT makes the decision. So the two times aren't about differentiating CDT and EDT.

Dutch-Booking CDT: Revised Argument

Ah, yeah, I'll think about how to clear this up. The short answer is that, yes, I slipped up and used CDT in the usual way rather than the broader definition I had set up for the purpose of this post.

On the other hand, I also want to emphasize that EDT two-boxes (and defects in twin PD) much more easily than I see commonly supposed. And, thus, to the extent one wants to apply the arguments of this post to TDT, TDT would also. Specifically, an EDT agent can only see something as correlated with its action if that thing has more information about the action than the EDT agent itself. Otherwise, the EDT agents own knowledge about its action screens off any correlation.

This means that in Newcomb with a perfect predictor, EDT one-boxes. But in Newcomb where the predictor is only moderately good, in particular knows as much or less than the agent, EDT two-boxes. So, similarly, TDT must two-box in these situations, or be vulnerable to the Dutch Book argument of this post.

When is CDT Dutch-Bookable?

A much improved Dutch Book argument is now here.

Modeling naturalized decision problems in linear logic

I haven't tried as hard as I could have to understand, so, sorry if this comment is low quality.

But I currently don't see the point of employing linear logic in the way you are doing it.

The appendix suggests that the solution to spurious counterfactuals here is the same as known ideas for resolving that problem. Which seems right to me. So solving spurious counterfactuals isn't the novel aspect here.

But then I'm confused why you focus on 5&10 in the main body of the post, since that's the main point of the 5&10 problem.

Maybe 5&10 is just a super simple example to illustrate things. But then, I don't know what it is you are illustrating. What is linear logic doing for you that you could not do some other way?

I have heard the suggestion that linear logic should possibly be used to aid in the difficulties of logical counterfactuals, before. But (somehow) those suggestions seemed to be doing something more radical. Spurious counterfactuals were supposed to be blocked by something about the restrictive logic. By allowing the chosen action to be used only once (since it gets consumed when used), something nicer is supposed to happen, perhaps avoiding problematic self-referential chains of reasoning.

(As I see it at the moment, linear logic seems to -- if anything -- work against the kind of thing we typically want to achieve. If you can use "my program, when executed, outputs 'one-box'" only once, you can't re-use the result both within Omega's thinking and within the physical choice of box. So linear logic would seem to make it hard to respect logical correlations. Of course this doesn't happen for your proposal here, since you treat the program output as classical.)

But your use (iiuc!) seems less radical. You are kind of just using linear logic as a way to specify a world model. But I don't see what this does for you. What am I missing?

An Orthodox Case Against Utility Functions

Of course, this interpretation requires a fair amount of reading between the lines, since the Jeffrey-Bolker axioms make no explicit mention of any probability distribution, but I don’t see any other reasonable way to interpret them,

Part of the point of the JB axioms is that probability is constructed together with utility in the representation theorem, in contrast to VNM, which constructs utility via the representation theorem, but takes probability as basic.

This makes Savage a better comparison point, since the Savage axioms are more similar to the VNM framework while also trying to construct probability and utility together with one representation theorem.

VNM does not start out with a prior, and allows any probability distribution over outcomes to be compared to any other, and Jeffrey-Bolker only allows comparison of probability distributions obtained by conditioning the prior on an event.

As a representation theorem, this makes VNM weaker and JB stronger: VNM requires stronger assumptions (it requires that the preference structure include information about all these probability-distribution comparisons), where JB only requires preference comparison of events which the agent sees as real possibilities. A similar remark can be made of Savage.

Starting with a prior that gets conditioned on events that correspond to the agent’s actions seems to build in evidential decision theory as an assumption, which makes me suspicious of it.

Right, that's fair. Although: James Joyce, the big CDT advocate, is quite the Jeffrey-Bolker fan! See Why We Still Need the Logic of Decision for his reasons.

I don’t think the motivation for this is quite the same as the motivation for pointless topology, which is designed to mimic classical topology in a way that Jeffrey-Bolker-style decision theory does not mimic VNM-style decision theory. [...] So a similar thing here would be to treat a utility function as a function from some lattice of subsets of (the Borel subsets, for instance) to the lattice of events.

Doesn't pointless topology allow for some distinctions which aren't meaningful in pointful topology, though? (I'm not really very familiar, I'm just going off of something I've heard.)

Isn't the approach you mention pretty close to JB? You're not modeling the VNM/Savage thing of arbitrary gambles; you're just assigning values (and probabilities) to events, like in JB.

Setting aside VNM and Savage and JB, and considering the most common approach in practice -- use the Kolmogorov axioms of probability, and treat utility as a random variable -- it seems like the pointless analogue would be close to what you say.

This can be resolved by defining worlds to be minimal non-zero elements of the completion of the Boolean algebra of events, rather than a minimal non-zero event.

Yeah. The question remains, though: should we think of utility as a function of these minimal elements of the completion? Or not? The computability issue I raise is, to me, suggestive of the negative.

Load More