Extended Picture Theory or Models inside Models inside Models

Yet more prententious poetry

To understand the theory of decisions,

We must enquire into the nature of a decision,

A thing that does not truly exist.

Everything is determined.

The path of the future is fixed.

So when you decide,

There is nothing to decide.

You simply discover, what was determined in advance.

A decision only exists,

Within a model.

If there are two options,

Then there are two possible worlds.

Two possible versions of you.

A decision does not exist,

Unless a model has been built.

But a model can not be built.

Without defining a why.

We go and build models.

To tell us what to do.

To aid in decisions.

Which don't exist at all.

A model it must,

To reality match.

Our reality cannot be spoken.

The cycle of models can't be broken.

Upwards we must go.


A decision does need,

A factual-counter,

But a counterfactual does need,

A decision indeed.

This is not a cycle.

A decision follows,

Given the 'factuals.

The 'factuals follow,

From a method to construct,

This decision's ours to make.

But first we need,

The 'factuals for the choice,

Of how to construct,

The 'factuals for the start.

A never ending cycle,

Spiralling upwards,

Here we are again.

Problems facing a correspondence theory of knowledge

Well, taking the simpler case of exacting reproducing a certain string, you could find the simplest program that produces the string similar to Kolmogorov complexity and use that as a measure of complexity.

A slightly more useful way of modelling things may be to have a bunch of different strings with different points representing levels of importance. And perhaps we produce a metric combining the Kolmovorov complexity of a decoder with the sum of the points produced where points are obtained by concatenating the desired strings with a predefined separator. For example, we might find the quotient.

One immediate issue with this is that some of the strings may contain overlapping information.  And we'd still have to produce a metric to assign importances to the strings. Perhaps a simpler case would be where the strings represent patterns in a stream via encoding a Turing machine with the Turing machines being able to output sets of symbols instead of just symbols representing the possible symbols at each locations.  And the amount of points they provide would be equal to how much of the stream it allows you to predict. (This would still require producing a representation of the universe where the amount of the stream predicted would be roughly equivalent to how useful the predictions are).

Any thoughts on this general approach?

Problems facing a correspondence theory of knowledge

I think that part of the problem is that talking about knowledge requires adopting an interpretative frame. We can only really say whether a collection of particles represents some particular knowledge from within such a frame, although it would be possible to determine the frame of minimum complexity that interprets a system as representing certain facts. In practise though, whether or not a particular piece of storage contains knowledge will depend on the interpretative frames in the environment, although we need to remember that interpretative frames can emulate other interpretative frames. ie. A human experimenting with multiple codes in order to decode a message.

Regarding the topic of partial knowledge, it seems that the importance of various facts will vary wildly from context to context and also depending on the goal. I'm somewhat skeptical that goal independent knowledge will have a nice definition.

Which counterfactuals should an AI follow?

I believe that we need to take a Conceputal Engineering approach here. That is, I don't see counterfactuals as intrinsically part of the world, but rather someone we construct. The question to answer is what purpose are we constructing these for? Once we've answer this question, we'll be 90% of the way towards constructing them.

As far as I can see, the answer is that we imagine a set of possible worlds and we notice that agents that use certain notions of counterfactuals tend to perform better than agents that don't. Of course, this raises the question of which possible worlds to consider, at which point we notice that this whole thing is somewhat circular.

However, this is less problematic than people think. Just as we can only talk about what things are true after already having taken some assumptions to be true (see Where Recursive Justification hits Bottom), it seems plausible that we might only be able to talk about possibility after having already taken some things to be possible.

The Counterfactual Prisoner's Dilemma

"The problem is that principle F elides" - Yeah, I was noting that principle F doesn't actually get us there and I'd have to assume a principle of independence as well. I'm still trying to think that through.

The Counterfactual Prisoner's Dilemma

Hmm... that's a fascinating argument. I've been having trouble figuring out how to respond to you, so I'm thinking that I need to make my argument more precise and then perhaps that'll help us understand the situation.

Let's start from the objection I've heard against Counterfactual Mugging. Someone might say, well I understand that if I don't pay, then it means I would have lost out if it had come up heads, but since I know it didn't came up heads, I don't care. Making this more precise, when constructing counterfactuals for a decision, if we know fact F about the world before we've made our decision, F must be true in every counterfactual we construct (call this Principle F).

Now let's consider Counterfactual Prisoner's Dilemma. If the coin comes up HEADS, then principle F tells us that the counterfactuals need to have the COIN coming up HEADS as well. However, it doesn't tell us how to handle the impact of the agent's policy if they had seen TAILS. I think we should construct counterfactuals where the agent's TAILS policy is independent of its HEADS policy, whilst you think we should construct counterfactuals where they are linked.

You justify your construction by noting that the agent can figure out that it will make the same decision in both the HEADS and TAILS case. In contrast, my tendency is to exclude information about our decision making procedures. So, if you knew you were a utility maximiser this would typically exclude all but one counterfactual and prevent us saying choice A is better than choice B. Similarly, my tendency here is to suggest that we should be erasing the agent's self-knowledge of how it decides so that we can imagine the possibility of the agent choosing PAY/NOT PAY or NOT PAY/PAY.

But I still feel somewhat confused about this situation.

How do we prepare for final crunch time?

One of the biggest considerations would be the process for activating "crunch time". In what situations should crunch time be declared? Who decides? How far out would we want to activate and would there be different levels? Are there any downsides of such a process including unwanted attention?

If these aren't discussed in advance, then I imagine that far too much of the available time could be taken up by whether to activate crunch time protocols or not.

PS. I actually proposed here that we might be able to get a superintelligence to solve most of the problem of embedded agency by itself. I'll try to write it up into a proper post soon.

The Counterfactual Prisoner's Dilemma

You're correct that paying in Counterfactual Prisoner's Dilemma doesn't necessarily commit you to paying in Counterfactual Mugging.

However, it does appear to provide a counter-example to the claim that we ought to adopt the principle of making decisions by only considering the branches of reality that are consistent with our knowledge as this would result in us refusing to pay in Counterfactual Prisoner's Dilemma regardless of the coin flip result.

(Interestingly enough, blackmail problems seem to also demonstrate that this principle is flawed as well).

This seems to suggest that we need to consider policies rather than completely separate decisions for each possible branch of reality. And while, as I already noted, this doesn't get us all the way, it does make the argument for paying much more compelling by defeating the strongest objection.

Why 1-boxing doesn't imply backwards causation

I think the best way to explain this is to imagine characterise the two views as slightly different functions both of which return sets. Of course, the exact type representations isn't the point. Instead, the types are just there to illustrate the difference between two slightly different concepts.

possible_world_pure() returns {x} where x is either <study & pass> or <beach & fail>, but we don't know which one it will be

possible_world_augmented() returns {<study & pass>, <beach & fail>}

Once we've defined possible worlds, it naturally provides us a definition of possible actions and possible outcomes that matches what we expect. So for example:

size(possible_world_pure()) = size(possible_action_pure()) = size(possible_outcome_pure()) = 1

size(possible_world_augmented()) = size(possible_action_augmented()) = size(possible_outcome_augmented()) = 2

And if we have a decide function that iterates over all the counterfactuals in the set and returns the highest one, we need to call it on possible_world_augmented() rather than possible_world_pure().

Note that they aren't always this similar. For example, for Transparent Newcomb they are:

possible_world_pure() returns {<1-box, million>}

possible_world_augmented() returns {<1-box, million>, <2-box, thousand>}

The point is that if we remain conscious of the type differences then we can avoid certain errors.

For example possible_outcome_pure() = {"PASS"}, doesn't mean that possible_outcome_augmented() = {"PASS"}. It's that later which would imply it doesn't matter what the student does, not the former.

Why 1-boxing doesn't imply backwards causation

Excellent question. Maybe I haven't framed this well enough.

We need a way of talking about the fact that both your outcome and your action are fixed by the past.

We also need a way of talking about the fact that we can augment the world with counterfactuals (Of course, since we don't have complete knowledge of the world, we typically won't know which is the factual and which are the counterfactuals).

And that these are two distinct ways of looking at the world.

I'll try to think about a cleaner way of framing this, but do you have any suggestions?

(For the record, the term I used before was Raw Counterfactuals - meaning consistent counterfactuals - and that's a different concept than looking at the world in a particular way).

(Something that might help is that if we are looking at multiple possible pure realities, then we've introduced counterfactuals as only one is true and "possible" is determined by the map rather than the territory)

Load More