FixDT

abramdemski

FixDT is not a very new decision theory, but little has been written about it afaict, and it's interesting. So I'm going to write about it.

TJ asked me to write this article to "offset" not engaging with Active Inference more. The name "fixDT" is due to Scott Garrabrant, and stands for "fixed-point decision theory". Ideas here are due to Scott Garrabrant, Sam Eisenstat, me, Daniel Hermann, TJ, Sahil, and Martin Soto, in roughly that priority order; but heavily filtered through my own lens.

This post may provide some useful formalism for thinking about issues raised in The Parable of Predict-O-Matic.

Self-fulfilling prophecies & other spooky map-territory connections.

A common trope is for magic to work only when you believe in it. For example, in Harry Potter, you can only get to the magical train platform 9 if you believe that you can pass through the wall to get there.

A plausible normative-rationality rule, when faced with such problems: if you want the magic to work, you should believe that it will work (and you should not believe it will work, if you want it not to work).

Can we sketch a formal decision theory which handles such problems?

We can't start by imagining that the agent has a prior probability distribution, like we normally would, since the agent would already be stuck -- either it lucked into a prior which believed the magic could work, or, it didn't.

Instead, the "beliefs" of the agent start out as maps from probability distributions to probability distributions. I'll use " $P$ " as the type for probability distributions (little $p$ for a specific probability distribution). So the type of "beliefs", $B$ , is a function type: $b : P \to P$ (little $b$ for a specific belief). You can think of these as "map-territory connections": $b$ is a (causal?) story about what actually happens, if we believe $p$ . A "normal" prior, where we don't think our beliefs influence the world, would just be a constant function: it always outputs the same $p$ no matter what the input is.

Given a belief $b$ , the agent then somehow settles on a probability distribution $p$ . We can now formalize our rationality criteria:

Epistemic Constraint: The probability distribution $p$ which the agent settles on cannot be self-refuting according to the beliefs. It must be a fixed point of $b$ : a $p$ such that $b (p) = p$ .

Instrumental Constraint: Out of the options allowed by the epistemic constraint, $p$ should be as good as possible; that is, it should maximize expected utility. $p := {argmax}_{p such that b (p) = p} E_{p} U$

We can also require that $b$ be a continuous function, to guarantee the existence of a fixed point^[1], so that the agent is definitely able to satisfy these requirements. This might seem like an arbitrary requirement, from the perspective where $b$ is a story about map-territory connections; why should they be required to be continuous? But remember that $b$ is representing the subjective belief-formation process of the agent, not a true objective story. Continuity can be thought of as a limit to the agent's own self-knowledge.

For example, the self-referential statement X: " $p (X) < \frac{1}{2}$ " suggests an "objectively true" belief which maps $p (X)$ to 1 if it's below 1/2, and maps it to 0 if it's above or equal to 1/2. But this belief has no fixed-point; an agent with this belief cannot satisfy the epistemic constraint on its rationality. If we require $b$ to be continuous, we can only approximate the "objectively true" belief function, by rapidly but not instantly transitioning from 1 to 0 as $p (X)$ rises from slightly less that 1/2 to slightly more.

These "beliefs" are a lot like "trading strategies" from Garrabrant Induction.

We can also replace the continuity requirement with a Kakutani requirement, to get something more like Paul's self-referential probability theory.

"Beliefs" are mathematically nice!

This section isn't even about the decision theory; I suppose it's skippable.

But this notion of "beliefs" is more useful that it may first appear.

First, notice that you can combine beliefs by weighted sum, in much the same way you can combine probability distributions into mixture models: $(w_{1} b_{1} + w_{2} b_{2}) (p) = w_{1} b_{1} (p) + w_{2} b_{2} (p)$ . This means we can represent our overall beliefs as a "mixture of hypotheses", just like with probabilities. The weights $w_{n}$ are analogous to probabilities; but we can also think of them as "wealths" to reflect the Garrabrant Induction idea.

As I mentioned already, we can think of "normal priors" as a special case of beliefs, where the belief is just a constant function, outputting the same probability distribution regardless of input. In this case, weighted sums of beliefs behave exactly like regular weighted sums of probability distributions.

However, while regular probabilistic mixture models only act like "alternative possibilities", belief mixtures can also combine constraints.

Let's focus on two events, $X_{1}$ and $X_{2}$ . The belief $b_{1}$ knows that $p (X_{1}) = \frac{1}{5}$ and knows nothing else. So it reacts to a given $p$ by Jeffrey-updating the probabilities so that $p (X_{1}) = \frac{1}{5}$ but the probability distribution is otherwise changed as little as possible. The belief $b_{2}$ knows that $X_{1} = X_{2}$ and nothing else. It reacts to a given $p$ by updating on this, to rule out worlds where the two events differ; but it is agnostic about what exact probabilities the two events should have.

Any mixture of these two beliefs will result in a belief which enforces both constraints; its only fixed points will have $p (X_{1}) = p (X_{2}) = \frac{1}{5}$ , and $p (X_{1} \land X_{2} \lor \neg X_{1} \land \neg X_{2}) = 1$ . The set of fixed points will not depend on the relative weight of the two hypotheses; relative weight only comes into play when you mix together inconsistent constraints.

So, belief functions allow us to represent abstract beliefs which are agnostic about some details of the probability distribution, as well as concrete beliefs which are fully detailed, and combine all of these things together with simple arithmetic. You could say that they can represent beliefs at multiple granularities. For this reason, Scott calls these things "multigrain models", which is a much better term for general use than the term "beliefs" I'm using in this essay.

Can this be the whole decision theory?

So we've got a nice generalized notion of "belief", and a proposed decision procedure which takes that generalized notion and chooses the best fixed-point, to handle self-fulfilling prophecies (as well as self-refuting beliefs and other spooky map-territory connections).

But we still have to make "normal" decisions; that is, we need to take "external" actions, not just decide on probabilities. The standard picture is that probabilities are an input to the action-deciding process. So it sounds like the new pipeline is: beliefs -> FixDT 'decision' -> probabilities -> ordinary 'decision' -> actions.

This is a bit complex and inelegant. It would be nice if we could "make a decision" just once, instead of twice. So, let's suppose that actions are controlled by self-fulfilling prophecies. For example, if a robot has a motor that can turn on or off, we want to wire it directly to the robot's belief about the motor. Maybe the motor turns on or off with precisely the probability given by the belief. Or perhaps there's a threshold; strong enough beliefs turn the motor on, and otherwise it shuts off. The details don't matter too much, so long as there's a consistent fixed-point where the motor is on, and a consistent fixed-point where the motor is off. (Although we will explore some problems with this soon.)

Great! Now we've unified all decisions into one type. All we need is FixDT; once the probabilities have been chosen, all of the decisions are already made. This picture has other advantages, too. The agent no longer needs to have a special category of "actions" which it can take. "Actions" are just things in the world that are influenced by the agent's probabilities. This results in a picture of agency where there's no ontologically special "output" or "action" type! Actuators are just parts of the world which somehow pay attention to the agent.

We can also use the "belief" datatype to unify the notion of input (observation/evidence) with the notion of "hypothesis" -- although this deserves its own write-up. The short version: imagine that $b_{n}$ is defined in reference to the world; that is, it modifies probabilities not by guessing, but rather, by looking at the world and reporting what it sees. Under some additional assumptions, $b_{n}$ 's influence will behave like a Bayesian update in the limit of $b_{n}$ having infinite weight with which to influence the probability distribution.

So we've dissolved the usual notions of "input" and "output" -- now we've just got a market of beliefs, "observations" are just things which influence the market, and "actions" are just things which are influenced by the market.

This seems like a great picture.

We've reversed the common picture that we first figure out what we believe, and then figure out what to do. The decision lives inside the computation of probabilities.
We can represent something resembling a Lobian handshake in a probabilistic setting: if I believe that your probability of cooperation is tied to mine, I can select a fixed-point with a high probability of cooperation for both of us. And if I'm right in my beliefs, you'll do the same.^[2]
We don't need to consider "actions" at all. Instead, there are just parts of the environment which react to our chosen probabilities; and we choose our probabilities with this in mind. Me choosing to type these words is no different in kind from a general choosing where to station troops; the fingers react to what I expect them to type, and the troops react to where I expect them to go.

Sadly, this nice picture falls apart when we look at learning-theoretic considerations.

Reasons for pessimism.

For the picture to work out, we need to be able to learn what we can control.

Eliminating the traditional decision-theoretic need for a list of possible actions to choose from doesn't do us much good if we still have to hard-code the beliefs which say that the robot's motors listen to the robot's probabilities in a particular way. Instead, we'd like the robot to be able to notice this for itself. This would also give us reassurance that it is controlling other aspects of the environment as appropriate.

To make discussion of this simple, I'm going to imagine that there is a "true" belief, $b^{t r u e}$ , which tells us the "actual" counterfactual relationship between our probabilities and reality. This is metaphysically questionable, but it makes sense in practice. For example, if I hook up my robot's motor to turn on if the robot's probability of the motor turning on is above $\frac{3}{4}$ , then $b^{t r u e}$ should map $p$ for which $p (o n) > \frac{3}{4}$ to some $p^{'}$ such that $p^{'} (o n) = 1$ .

If it helps, you can think of $b^{t r u e}$ as a "calibration" function which maps uncalibrated probabilities to the probability where it would be calibrated. Normally, we think of calibration functions as representing underconfidence and overconfidence -- if when I say "90%" the event actually occurs an average of 80% of the time, then I'm overconfident and should adjust my probabilities downward. The idea here is exactly the same, except that here we're considering a case where the 80% observed frequency we see in the world might be a reaction to the 90% probability -- so if we move down to 80%, the world might move down further, to 70%, or might move up to 100%, etc. (This is why we need to select a fixed point of the calibration function, rather than just naively adjust in the right direction.)

Seeing $b^{t r u e}$ as a calibration function will be more comfortable for a frequentist, who can consider all of this well-defined so long as we can place situations into sequences of random experiments. Causal decision theorists may prefer to think of $b^{t r u e}$ as giving the true causal relationship between our probabilities and the world.^[3]

So, basically, we want beliefs to approximate $b^{t r u e}$ as we learn. More specifically, our beliefs should approximate the set of fixed points for $b^{t r u e}$ .

This implies some kind of iterated setting, where the agent updates its beliefs over time and selects fixed points repeatedly, rather than just once. I will assume that things look similar to Garrabrant Induction, in that respect. But this is not a formal impossibility proof! I am sketching reasons for pessimism, not formally showing that FixDT will never work. So don't worry about the details -- make up your own assumptions if my reasoning doesn't make sense to you. Let me know if you get it to work!

It would be easy if we could try out different probabilities $p$ and see $b^{t r u e} (p)$ for each. It would just be a regression problem. The problem is, we don't get to observe probabilities. We only observe what happens.^[4]

Imagine that our beliefs $b$ are a weighted mixture of $b_{1}, b_{2}, . . . b_{n}$ , and $b^{t r u e}$ is already one of the $b_{i}$ . (This is usually the easiest case for learning -- the "realizable" case. If this doesn't work, there would seem to be little hope more generally.) How can we reward $b^{t r u e}$ for getting things right?

Our chosen probabilities $p$ will be a fixed-point of $b$ , but will not necessarily be a fixed-point of every $b_{i}$ in our mixture. We can reward beliefs which were pushing in the right direction. If $p (X)$ was 1/2, and $b_{1} (p) (X) = 1 / 3$ , we could say that $b_{1}$ was trying to pull the probability down. If we then observe that $X$ turned out to be false, then $b_{1}$ should get rewarded with a higher weight in our mixture.

Now, here's the problem: we can't, in general, reward beliefs which correctly identify fixed-points of $b^{t r u e}$ , or punish beliefs which incorrectly rule out $p$ which are fixed points of $b^{t r u e}$ .

Suppose that $b^{t r u e}$ has two fixed-points, a good one $p^{g o o d}$ and a bad one $p^{b a d}$ . Our only other hypothesis, $b^{f a l s e}$ , is defined as follows: $b^{f a l s e} (p) := \frac{1}{2} p + \frac{1}{2} p^{b a d}$ ; that is, it drags things halfway from wherever they are to $p^{b a d}$ . This can (with enough weight relative to other hypotheses) completely eliminate $p^{g o o d}$ as a fixed point, leaving only $p^{b a d}$ . $b^{f a l s e}$ will never lose credibility for doing this, since at $p^{b a d}$ it makes the same prediction as $b^{t r u e}$ -- which is to say, neither of them want to make any corrections to the probabilities at that point, so no learning will happen no matter what gets observed.

In general, if we are at some fixed-point of $b^{t r u e}$ , then $b^{t r u e}$ will not be making any correction to that fixed-point; so it seems difficult to reward or punish $b^{t r u e}$ . FixDT chooses some probability; then we observe what happens; it seems like we can only reward beliefs which were trying to push the probability towards the thing that happened (and punish those who pulled in the other direction).

Attraction & Repulsion

Actually, we can distinguish between fixed-points of $b^{t r u e}$ which are attractor points vs those which are repulsive. (More generally, points can be varyingly attractive/repulsive when approached from different directions.)

For example, suppose I wire up a motor response like this:

The 50% point will be a fixed-point, but it will be repulsive: beliefs very close to the fixed-point would map to beliefs a bit further away, so that if we iterated $b^{t r u e}$ , points initially near 50% would shoot away.

Similarly, 100% and 0% are attractive fixed-points; probabilities near to them rapidly converge toward them if we iterate $b^{t r u e}$ .

If the full market's fixed-point ends up being close to an attractive point of $b^{t r u e}$ , then reality will respond by being even closer to the attractive point. This suggests that we can learn such points! Beliefs which are pushing toward the fixed-point will be increasingly vindicated (in expectation, if we use a proper scoring rule to reward/punish beliefs).

On the other hand, belief in repulsive fixed-points will be correspondingly punished.

This suggests that we can get some positive learning-theoretic results if we limit our aspirations: perhaps we cannot learn $b^{t r u e}$ in general, but can learn its attractive fixed-points.

(But don't forget that this can be a big disappointment from a decision-theoretic perspective. The attractive fixed-points can be terrible, and the repulsive fixed-points can be wonderful.)

Active inference to the rescue?

Some might say that the problem, here, is that I am using some of the ideas from Active Inference without adopting the full package.

Specifically, FixDT has in common with Active Inference that motor outputs are a function of what the agent believes its motor outputs will be, rather than the more common idea of being a function of expected utility.

But FixDT is trying to get away with this move without the accompanying Active Inference idea of skewing beliefs toward success.

Can we fix FixDT by adding in more ideas from Active Inference? Sort of, but I don't find it very satisfying.

Friendly Actuators?

I observed that attractive fixed-points appear to be learnable, while repulsive fixed-points appear unlearnable. But whether a point is attractive vs repulsive depends on $b^{t r u e}$ , which is to say, it depends on how the environment reacts to our beliefs. For example, we could wire up the motor responses to be as follows instead of the suggestion I illustrated earlier:

The important thing to note, here, is that I've flipped which fixed-points are attractive vs repulsive. This is not very nice for the agent; it means the 50% point is learnable, but properly turning the motors on/off is no longer learnable.

So we could define "friendly actuators" as ones which have been designed so as to be easy for the agent to learn how to use. Is there a systematic way to design friendly actuators?

Well, we could take the idea from Active Inference. Rather than copy the action probabilities from the agent's chosen probabilities (which would make every distribution over actions a fixed-point of $b^{t r u e}$ , but neither attractive nor repulsive, and therefore not very learnable) we should instead take the agent's probabilities, bias them toward success, and copy those probabilities. Since action-probabilities will always be shifted toward better outcomes, only optimal actions will be fixed-points.

(This prevents us from learning full control; but who cares about failing to learn suboptimal fixed-points? We really only need to be able to learn the ones we actually want to choose.)

My problem with this idea is that we're introducing "actuator decision theory" -- the actuator is now asked to be intelligent itself, in order to cooperate with the agent. We might as well have the actuator just make the best decision based on the beliefs, then! This returns us to classical decision theory.

Biased Reporting?

A different way to try and import the Active Inference idea is to bias the agent's probabilities themselves, rather than putting that responsibility on the actuators. Again, the idea is to make better outcomes learnable by helping them to be attractive fixed-points.

For example, imagine Popular News Network (PNN) finds itself regularly reporting on bank runs. Bank runs have become a big problem, and PNN is doing a service to its viewers by reporting on expert predictions about which banks are in the process of collapsing, which banks are unstable ground and might be next, which banks seem secure, etc.

PNN is not blind to the fact that its reports can actually cause or prevent bank-runs. Thus far, PNN's ethical position has been that they're doing fine so long as they (1) report the truth as accurately as they are able (the epistemic constraint) and (2) when the accuracy constraint allows for multiple possible reports to be fixed-points, they choose whichever report results in the fewest bank-runs (the instrumental constraint).

However, PNN has noticed that despite their judicious adherence to the above, more and more bank-runs seem to be happening. Their expert analysts have figured out that bank runs are attractive fixed-points, but non-bank-runs are repulsive; the number of bank-runs in a given week roughly tracks however many PNN forecasts, but looking at the details, there are about 5% more on average than whatever is forecast.

As a result, the reporters, bound by honesty, keep sliding in the direction of predicting more bank runs, since the numbers tend to prove their previous forecasts to be underestimates.

Taking an idea from Active Inference, PNN executives ask reporters to reduce their forecasted numbers by 10% from whatever the honest forecast would be, in the hopes of putting pressure against bank-runs.

I have a couple of problems with this approach.

First, if we violate the epistemic constraint, are the reported numbers really "probabilities" any more? They're just some numbers we made up. By bending epistemic rationality, we lose the nice properties we invented it for. Why invoke probability theory at all, if you're no longer trying to make your probabilities calibrated?^[5]

Second, and relatedly: the viewers of PNN can pick up on the biased reporting and adjust the numbers back up by 10%.

This gets us into murky philosophical issues behind FixDT. The idea of FixDT is that the world might somehow react to our probabilities. But how does the world zero in on "our probabilities" to react to them? If we're settled on a specific version of FixDT, we don't care; FixDT just tracks how the world reacts, and chooses fixed-points accordingly.

But if we're trying to decide between versions of FixDT (or between FixDT and other options), it might start to matter how the world detects our probabilities in order to react. If we violate the Epistemic Constraint and adjust some numbers up by 10%, will the world adjust those numbers back down before reacting to them?

Obviously, it depends. In some cases, the Active Inference idea will work fine. But in many cases of interest, it won't. That's really all I can say, here.

Connection to the "futarchy hack".

Earlier, in my heuristic argument that FixDT can't learn $b^{t r u e}$ , I divided the problem into two parts: (a) we can't reward traders who successfully make fixed-points of $b^{t r u e}$ into fixed-points of the market; (b) we can't punish traders who successfully rule out fixed-points of $b^{t r u e}$ as market fixed-points.

The second problem is very similar to the untaken actions problem, often called the "futarchy hack" (often, in terms of the in-person LWDT community) because it is a way to control the decisions of a futarchy without risking any money: if you can bet enough money that the option you don't want will be bad for everyone, then that action won't get taken, so you'll simply get your money back. You put your money where your mouth was, but your predictions didn't get empirically tested.

One of the best remedies to this problem (perhaps the best remedy) is Decision Markets (aka BRIA), by Caspar Oesterheld. But I don't have a specific proposal for how to combine that with FixDT.

Future work?

Combining updateless reasoning with FixDT.
Further work on the learning-theoretic issues for FixDT.
Spelling out the "dissolve the notion of evidence" thing I mentioned.
Exploring the combination of BRIA and FixDT.
FixDT can be seen as going up a single meta-level, from probabilities to $P \to P$ maps. But what if the world reacts to your "belief" (your $P \to P$ map)? Can we somehow deal with the implied infinite regress?
FixDT game theory. Perhaps FixDT hierarchical game theory.
Removing talk of "calibration" and $b^{t r u e}$ ; motivating similar ideas in less ontologically questionable ways.
Capitalizing on the nice ontology FixDT offers, to somehow further clarify "agent boundaries" stuff, or other issues in embedded agency?
If we squint, we can see the Futarchy Hack as a failure of preference aggregation. We could say "the beliefs may actually have preferences" and attempting to rule out a fixed-point is a kind of vote. This is similar to the Active Inference idea, really. We can model Active Inference's way of biasing beliefs toward success by putting in a belief which pushes things toward success (rather than my much grosser, but basically similar, proposal of biasing things toward success after the fixed-point is chosen). Thus we can see "beliefs" as actually having a value component (based on which fixed-points they push things to). Can this get us anywhere??

^{^}
We also need to assume that the space of probability distributions being considered is compact, to apply Brouwer's fixed point theorem.
^{^}
This isn't a super-great "handshake" really -- I think it is little better than what EDT offers by allowing agents to believe that they are correlated with one another. The problem with both pictures is that there isn't a learning-theoretic story showing that agents can converge toward cooperation on such a basis, as far as I know.
^{^}
If neither of these pictures is satisfying to you, well... I think many conclusions one can reach by pretending there's a $b^{t r u e}$ can be defended more carefully by other means, but I fully admit I'm not doing the work here.
^{^}
Of course, we only get to observe what happens for some observable things; I can't directly observe whether my beliefs impact eddies in the currents deep within the sun, for example. But I don't even expect that problem to be solvable in principle -- agents just have to make due with some irreducible uncertainty about such things. But it does feel like I should be able to learn the calibration function for motor-control problems, in order for FixDT to be considered a success.
^{^}
Or, we could make this point in other ways, if "calibration" is meaningless to you. For example, biased probabilities will no longer maximize expected accuracy.

[-]Sylvester Kollin5mo72

A common trope is for magic to work only when you believe in it. For example, in Harry Potter, you can only get to the magical train platform 9 3/4 if you believe that you can pass through the wall to get there.

Are you familiar with Greaves' (2013) epistemic decision theory? These types of cases are precisely the ones she considers, although she is entirely focused on the epistemic side of things. For example (p. 916):

Leap. Bob stands on the brink of a chasm, summoning up the courage to try and leap across it. Confidence helps him in such situations: specifically, for any value of between $0$ and $1$ , if Bob attempted to leap across the chasm while having degree of belief $x$ that he would succeed, his chance of success would then be $x$ . What credence in success is it epistemically rational for Bob to have?

And even more interesting cases (p. 917):

Embezzlement. One of Charlie’s colleagues is accused of embezzling funds. Charlie happens to have conclusive evidence that her colleague is guilty. She is to be interviewed by the disciplinary tribunal. But Charlie’s colleague has had an opportunity to randomize the content of several otherwise informative files (files, let us say, that the tribunal will want to examine if Charlie gives a damning testimony). Further, in so far as the colleague thinks that Charlie believes him guilty, he will have done so. Specifically, if $x$ is the colleague’s prediction for Charlie’s degree of belief that he’s guilty, then there is a chance $x$ that he has set in motion a process by which each proposition originally in the files is replaced by its own negation if a fair coin lands Heads, and is left unaltered if the coin lands Tails. The colleague is a very reliable predictor of Charlie’s doxastic states. After such randomization (if any occurred), Charlie has now read the files; they (now) purport to testify to the truth of $n$ propositions $P_{1}, \dots, P_{n}$ . Charlie’s credence in each of the propositions $P_{i},$ conditional on the proposition that the files have been randomized, is $1 / 2$ ; her credence in each $P_{i}$ conditional on the proposition that the files have not been randomized is $1$ . What credence is it epistemically rational for Charlie to have in the proposition $G$ that her colleague is guilty and in the propositions $P_{i}$ that the files purport to testify to the truth of?

In particular, Greaves' (2013, §8, pp. 43-49) epistemic version of Arntzenius' (2008) deliberational (causal) decision theory might be seen as a way of making sense of the first part of your theory. The idea, inspired by Skyrms (1990), is that deciding on a credence involves a cycle of calculating epistemic expected utility (measured by a proper scoring rule), adjusting credences, and recalculating utilities until an equilibrium is
obtained. For example, in Leap above, epistemic D(C)DT would find any credence permissible. And I guess that the second part of your theory serves as a way of breaking ties.

[-]Abram Demski5mo20

Yes, thanks for citing it here! I should have mentioned it, really.

I see the Skyrms iterative idea as quite different from the "just take a fixed point" theory I sketch here, although clearly they have something in common. FixDT makes it easier to combine both epistemic and instrumental concerns -- every fixed point obeys the epistemic requirement; and then the choice between them obeys the instrumental requirement. If we iteratively zoom in on a fixed point instead of selecting from the set, this seems harder?

If we try the Skyrms iteration thing, maybe the most sensible thing would be to move toward the beliefs of greatest expected utility -- but do so in a setting where epistemic utility emerges naturally from pragmatic concerts (such as A Pragmatists Guide to Epistemic Decision Theory by Ben Levinstein). So the agent is only ever revising its beliefs in pragmatic ways, but we assume enough about the environment that it wants to obey both the epistemic and instrumental constraints? But, possibly, this assumption would just be inconsistent with the sort of decision problem which motivates FixDT (and Greaves).

[-]Sylvester Kollin5mo20

You might also find the following cases interesting (with self-locating uncertainty as an additional dimension), from this post.

Sleeping Newcomb-1. Some researchers, led by the infamous superintelligence Omega, are going to put you to sleep. During the two days that your sleep will last, they will briefly wake you up either once or twice, depending on the toss of a biased coin (Heads: once; Tails: twice). After each waking, they will put you back to sleep with a drug that makes you forget that waking. The weight of the coin is determined by what the superintelligence predicts that you would say when you are awakened and asked to what degree ought you believe that the outcome of the coin toss is Heads. Specifically, if the superintelligence predicted that you would have a degree of belief in Heads, then they will have weighted the coin such that the 'objective chance' of Heads is $p$ . So, when you are awakened, to what degree ought you believe that the outcome of the coin toss is Heads?

Sleeping Newcomb-2. Some researchers, led by the superintelligence Omega, are going to put you to sleep. During the two days that your sleep will last, they will briefly wake you up either once or twice, depending on the toss of a biased coin (Heads: once; Tails: twice). After each waking, they will put you back to sleep with a drug that makes you forget that waking. The weight of the coin is determined by what the superintelligence predicts your response would be when you are awakened and asked to what degree you ought to believe that the outcome of the coin toss is Heads. Specifically, if Omega predicted that you would have a degree of belief $p$ in Heads, then they will have weighted the coin such that the 'objective chance' of Heads is $1 - p$ . Then: when you are in fact awakened, to what degree ought you believe that the outcome of the coin toss is Heads?

[-]Nate Showell5mo20

It seems like fixed points could be used to replace the concept of utility, or at least to ground it as an inferred property of more fundamental features of the agent-environment system. The concept of utility is motivated by the observation that agents have preference orderings over different states. Those preference orderings are statements about the relative stability of different states, in terms of the direction in which an agent tends to transition between them. It seems duplicative to have both utilities and fixed points as two separate descriptions of state transition processes in the agent-environment system; utilities look like they could be defined in terms of fixed points.

As one preliminary idea for how to do this, you could construct a fully connected graph in which the vertices are the probability distributions $p$ that satisfy $b (p) = p$ . The edges $E$ are beliefs that represent hypothetical transitions between the fixed points. The graph $G$ would take the place of a preference ordering by describing the tendency of the agent to move between the fixed points if given the option. (You could also model incomplete preferences by not making the graph fully connected.) Performing power iteration with the transition matrix of $G$ would act as a counterpart to moving through the preference ordering.

Further exploration of this unification of utilities and fixed points could involve connecting $G$ to the beliefs that are actually, rather than just counterfactually, present in the agent-environment system, to describe what parts of the system the agent can control. Having a way to represent that connection could let us rewrite the instrumental constraint to not rely on $U$ .

An intriguing perspective, but I'm not sure whether I agree. Naively, it would seem that a choice between fixed points in the FixDT setting is just a choice between different probability distributions, which brings us very close to the VNM idea of a choice between gambles. So VNM-like utility theory seems like the obvious outcome.

That being said, I don't really agree with the idea that an agent should have a fixed VNM-like utility function. So I do think some generalization is needed.

[-]Sylvester Kollin5mo00

Epistemic Constraint: The probability distribution which the agent settles on cannot be self-refuting according to the beliefs. It must be a fixed point of $b$ : a $p$ such that $b (p) = p$ .

Minor: there might be cases in which there is a fixed point $p$ , but where the agent doesn't literally converge or deliberate their way to it, right? (Because you are only looking for $b$ to satisfy the conditions of Brouwer/Kakutani, and not, say, Banach, right?) In other words, it might not always be accurate to say that the agent "settles on $p$ ". EDIT: oh, maybe you are just using "settles on" in the colloquial way.

Yeah, "settles on" here meant however the agent selects beliefs. The epistemic constraint implies that the agent uses exhaustive search or some other procedure guaranteed to produce a fixed point, rather than Banach-style iteration.

Moving to a Banach-like setting will often make the fixed points unique, which takes away the whole idea of FixDT.

Moving to a setting where the agent isn't guaranteed to converge would mean we have to re-write the epistemic constraint to be appropriate to that setting.

AI ALIGNMENT FORUM
AF