Clarifying Consequentialists in the Solomonoff Prior

[-]eric_langlois7y40

The part about the reasoners having an arbitrary amount of time to think wasn't obvious to me. The TM can run for arbitrarily long but if it is simulating a universe and using the universe to determine its output then the TM needs to specify a system for reading from the universe.

If that system involves a start-to-read time that is long enough for the in-universe life to reason about the universal prior then that time specification alone would take a huge number of bits.

On the other hand, I could imagine a scheme that looks for a specific short trigger sequence at a particular spatial location then starts reading out. If this trigger sequence is unlikely to occur naturally then the civilization would have as long as they want to reason about the prior. So overall it does seem plausible to me now to allow for arbitrarily long in-universe time.

[-]Vlad Mikulik7y20

The trigger sequence is a cool idea.

I want to add that the intended generator TM also needs to specify a start-to-read time, so there is symmetry there. Whatever method a TM needs to use to select the camera start time in the intended generator for the real world samples, it can also use in the simulated world with alien life, since for the scheme to work only the difference in complexity between the two matters.

There is additional flex in that unlike the intended generator, the reasoner TM can sample its universe simulation at any cheaply computable interval, giving the civilisation the option of choosing any amount of thinking they can perform between outputs, if they so choose.

[-]AlexMennen7y20

I'm not convinced that the probability of S' could be pushed up to anything near the probability of S. Specifying an agent that wants to trick you into predicting S' rather than S with high probability when you see their common prefix requires specifying the agency required to plan this type of deception (which should be quite complicated), and specifying the common prefix of S and S' as the particular target for the deception (which, insofar as it makes sense to say that S is the "correct" continuation of the prefix, should have about the same "natural" complexity as S). That is, specifying such an agent requires all the information required to specify S, plus a bunch of overhead to specify agency, which adds up to much more complexity than S itself.

[-]paulfchristiano7y30

specifying the agency required to plan this type of deception (which should be quite complicated)

Suppose that I just specify a generic feature of a simulation that can support life + expansion (the complexity of specifying "a simulation that can support life" is also paid by the intended hypothesis, so we can factor it out). Over a long enough time such a simulation will produce life, that life will spread throughout the simulation, and eventually have some control over many features of that simulation.

And specifying the common prefix of S and S' as the particular target for the deception (which, insofar as it makes sense to say that S is the "correct" continuation of the prefix, should have about the same "natural" complexity as S)

Once you've specified the agent, it just samples randomly from the distribution of "strings I want to influence." That has a way lower probability than the "natural" complexity of a string I want to influence. For example, if 1/quadrillion strings are important to influence, then the attackers are able to save log(quadrillion) bits.

[-]AlexMennen7y10

Suppose that I just specify a generic feature of a simulation that can support life + expansion (the complexity of specifying "a simulation that can support life" is also paid by the intended hypothesis, so we can factor it out). Over a long enough time such a simulation will produce life, that life will spread throughout the simulation, and eventually have some control over many features of that simulation.

Oh yes, I see. That does cut the complexity overhead down a lot.

Once you've specified the agent, it just samples randomly from the distribution of "strings I want to influence." That has a way lower probability than the "natural" complexity of a string I want to influence. For example, if 1/quadrillion strings are important to influence, then the attackers are able to save log(quadrillion) bits.

I don't understand what you're saying here.

[-]Vlad Mikulik7y10

I agree that this probably happens when you set out to mess with an arbitrary particular S, I.e. try to make some S’ that shares a prefix with S as likely as S.

However, some S are special, in the sense that their prefixes are being used to make very important decisions. If you, as a malicious TM in the prior, perform an exhaustive search of universes, you can narrow down your options to only a few prefixes used to make pivotal decisions, selecting one of those to mess with is then very cheap to specify. I use S to refer to those strings that are the ‘natural’ continuation of those cheap-to-specify prefixes.

There are, it seems to me, a bunch of other equally-complex TMs that want to make other strings that share that prefix more likely, including some that promote S itself. What the resulting balance looks like is unclear to me, but what’s clear is that the prior is malign with respect to that prefix - conditioning on that prefix gives you a distribution almost entirely controlled by these malign TMs. The ‘natural’ complexity of S, or of other strings that share the prefix, play almost no role in their priors.

The above is of course conditional on this exhaustive search being possible, which also relies on there being anyone in any universe that actually uses the prior to make decisions. Otherwise, we can’t select the prefixes that can be messed with.

[-]AlexMennen7y10

This reasoning seems to rely on there being such strings S that are useful to predict far out of proportion to what you would expect from their complexity. But a description of the circumstance in which predicting S is so useful should itself give you a way of specifying S, so I doubt that this is possible.

[-]Vlad Mikulik7y10

I agree. That’s what I meant when I wrote there will be TMs that artificially promote S itself. However, this would still mean that most of S’s mass in the prior would be due to these TMs, and not due to the natural generator of the string.

Furthermore, it’s unclear how many TMs would promote S vs S’ or other alternatives. Because of this, I don’t now whether the prior would be higher for S or S’ from this reasoning alone. Whichever is the case, the prior no longer reflects meaningful information about the universe that generates S and whose inhabitants are using the prefix to choose what to do; it’s dominated by these TMs that search for prefixes they can attempt to influence.

[-]AlexMennen7y20

I didn't mean that an agenty Turing machine would find S and then decide that it wants you to correctly predict S. I meant that to the extent that predicting S is commonly useful, there should be a simple underlying reason why it is commonly useful, and this reason should give you a natural way of computing S that does not have the overhead of any agency that decides whether or not it wants you to correctly predict S.

[-]paulfchristiano7y10

How many bits do you think it takes to specify the property "people's predictions about S, using universal prior P, are very important"?

(I think you'll need to specify the universal prior P by reference to the universal prior that is actually used in the world containing the string S, if you spell out the prior P explicitly you are already sunk just from the ambiguity in the choice of language.)

It seems relatively unlikely to me that this will be cheaper than specifying some arbitrary degree of freedom in a computationally rich universe that life can control (+ the extra log(fraction of degrees of freedom the consequentialists actually choose to control)). Of course it might.

I agree that the entire game is in the constants---what is the cheapest way to pick out important strings.

[-]AlexMennen7y10

I don't think that specifying the property of importance is simple and helps narrow down S. I think that in order for predicting S to be important, S must be generated by a simple process. Processes that take large numbers of bits to specify are correspondingly rarely occurring, and thus less useful to predict.

[-]paulfchristiano7y30

I don't buy it. A camera that some robot is using to make decisions is no simpler than any other place on Earth, just more important.

(This already gives the importance-weighted predictor a benefit of ~log(quadrillion))

Clearly you need to e.g. make the anthropic update and do stuff like that before you have any chance of competing with the consequentialist. This might just be a quantitative difference about how simple is simple---like I said elsewhere, all the action is in the additive constants, I agree that the important things are "simple" in some sense.

[-]AlexMennen7y10

Ok, I see what you're getting at now.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

10

Clarifying Consequentialists in the Solomonoff Prior

10

i. A brief sketch of the Solomonoff prior

ii. The Malign Prior Argument

iii. The support for premiss 1

iv. Alien consequentialists

v. Conclusion. Do these arguments actually work?