Algorithmic Similarity

A few comments:

Regarding algorithmic similarity. This is an idea I just thought of this moment, so I'm not sure how solid it is, but. Given Turing machines $M_{1}$ and $M_{2}$ that compute the same functions, we want to say whether in some sense they do it "in the same way". Let's consider, for any input $x$ , the entire histories of intermediate states of the computations $M_{1} (x)$ and $M_{2} (x)$ . Call them $h_{1} (x)$ and $h_{2} (x)$ . We then say that $M_{1}$ and $M_{2}$ are "algorithmically equivalent" when there is a low complexity algorithm $A$ that, given access to $h_{1} (x)$ , can produce any given part of $h_{2} (x)$ , and vice versa. In particular, the complexity of $A$ must be much lower than the complexity of running $M_{i} (x)$ from the beginning. Here, it seems useful to play with different types of complexity bounds (including time/space for example).
Regarding waterfalls and human beings. I think that a waterfall is not simulating a human being, because there is no algorithm of simultaneously low description complexity and low computational complexity that can decode a human being from a waterfall. Ofc it is not a binary distinction but a fuzzy distinction (the simpler the decoding algorithm is, the more reasonable it is to say a human being is there).
Regarding diamond optimizers. I think that the right way to design such an optimizer would be using an instrumental reward function. We then remain with the problem of how to specify this function. We could start with some ontology or class of ontologies that can be reasonably said to contain diamonds, and for which we can define the reward function unambiguously. These ontologies are then mapped into the space of instrumental states, giving us a partial specification of the instrumental reward function (it is specified on the affine span of the images of the ontologies). Then, there is the quesiton of how to extend the reward function to the entire instrumental state space. I wrote a few thoughts about that in the linked essay, but another approach we can take is, considering all extensions that have same range of values. These form a convex set, that can be interpreted as Knightian uncertainty regarding the reward function. We can then consider maximin policies for this set to be "diamond maximizers". In other words, we want the maximizer to be cautious/conservative about judging the number of diamonds on states that lie outside the ontologies.

I definitely think the computational complexity approach is worth looking into, though I think computational complexity behaves kind of weirdly at low complexities.

I like the view that waterfalls are at least a bit conscious! Definitely goes against my own intuition.

I'm a bit worried that whether or not there is a low description complexity and low computational complexity algorithm that decodes a human from a waterfall might depend heavily on how we encode the waterfall as a mathematical object and that although it would be clear for "natural" encodings that it was unlike a human we might need a theory to tell us which encodings are natural and which are not.

[-]Vanessa Kosoy6y10

Not sure what do you mean by "computational complexity behaves kind of weirdly at low complexities"? In this case, I would be tempted to try the complexity class $L$ (logarithmic space complexity).

The most natural encoding is your "qualia", your raw sense data. This still leaves some freedom for how do you represent it, but this freedom has only a very minor effect.

[-]Daniel Kokotajlo6y40

Thanks, this is a good write-up!

Many years ago I wrote my undergraduate thesis on the waterfall problem (though it went by another name to me). Basically, I painstakingly and laboriously transformed an arbitrary human into an arbitrary rock of sufficient size, via a series of imperceptibly tiny steps none of which can be felt by the human. (I did this in imagination, not in reality, to be clear) The point was to see if any of the steps seemed like good places to draw a line and say "Here, consciousness is starting to go out; the system is starting to be less of a person." As a result I became fairly convinced that there aren't any good places to draw the line. So I guess I'm a waterfall apologist now!

[-]LukasM6y30

Thanks a lot for the link. I'll put it in the reading list (if you don't mind).

I would be interested to hear what you think about the more technical version of the problem. Do you also think that that can have no good solution, or do you think that a solution just won't have the nice philosophical consequences?

Also, I'm excited to know a smart waterfall apologist and if you're up for it I would really like to talk more with you about the argument in your thesis when I have thought about it a bit more.

[-]Daniel Kokotajlo6y20

I'm glad you are interested, and I'd love to hear your thoughts on the paper if you read it. I'd love to talk with you too; just send me an email when you'd like and we can skype or something.

What do you mean by "the more technical version of the problem" exactly?

My take right now is that algorithmic similarity (and instantiation) at least the versions of it relevant for consciousness and decision theory and epistemology will have to be either a brute empirical fact about the world, or a subjective fact about the mind of the agent reasoning about it (like priors and utility functions). What it will not be is some reasonably non-arbitrary property/relation with interesting and useful properties (like nash equilibria, centers of mass, and temperature)

[-]Matthew Barnett6y10

Another reason why algorithmic similarity would be useful is related to a recent line of thinking that I've explored recently. Specifically, the question is how we could regularize neural networks in order to make their computations more interpretable. The reason why a theory of algorithmic similarity would help is because we could apply some penalty to a neural network whose internal operations are too dissimilar to some understandable algorithm. This would encourage the neural network to mirror an interpretable computation which makes it easier for us to look inside and see what it's doing.

Ideally, this would provide us the performance gains of neural networks while keeping the interpretability of GOFAI algorithms, like tree search.

[-]Charlie Steiner6y10

Hey, this is well written.

Out of curiosity, how do you feel about Rice's Theorem?

[-]LukasM6y10

Thank you!

I hadn't thought about Rice's Theorem in this context before but it makes a lot of sense.

I guess I would say that Rice's Theorem tells us that you can't computably categorize Turing machines based on the functions they describe, but since algorithmic similarity calls for a much finer classification I don't immediately see how it would apply.

And even if we had an impossibility result of this kind, I don't think it would actually be a deal breaker, since we don't need the classification to be computable in general to be enlightening.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

16

16

The Problem

A Variant

The Real Problem

Why You Should Care

Why The Question Doesn't Make Sense And We Shouldn't Expect There To Be A Satisfying Answer

Conclusion