Vojtech Kovarik

Background in mathematics (descriptive set theory, Banach spaces) and game-theory (mostly zero-sum, imperfect information games). CFAR mentor. Usually doing alignment research.

Vojtech Kovarik's Comments

New paper: (When) is Truth-telling Favored in AI debate?

I guess on first reading, you can cheat by reading the introduction, Section 2 right after that, and the conclusion. One level above that is reading the text but skipping the more technical sections (4 and 5). Or possibly reading 4 and 5 as well, but only focusing on the informal meaning of the formal results.

Regarding the background knowledge required for the paper: It uses some game theory (Nash equilibria, extensive form games) and probability theory (expectations, probability measures, conditional probability). Strictly speaking, you can get all of this from looking up whichever keywords on wikipedia. I think that all of the concepts used there are basic in the corresponding fields, and in particular no special knowledge of measure theory is required. However, I studied both game theory and measure theory, so I am biased, and you shouldn't trust me. (Moreover, there is a difference between "strictly speaking, only this is needed" and "my intuitions are informed by X, Y, and Z".)

Another thing is that the AAAI workshop where this will appear has a page limit, which means that some explanations might have gotten less space than they would deserve. In particular, the arguments in Section 4 are much easier to digest if you can draw the functions that the text talks about. To understand the formal results, I think I visualized two-dimensional slices of the "world space" (i.e., squares), and assumed that the value of the function is 0 by default, except for being 1 at some selected subset of the square. This allows you to compute all the expectations and conditionals visually.

Deconfuse Yourself about Agency

First off, while I feel somewhat de-confused about X-like behavior, I don't feel very confident about X-like architectures. Maybe the meaning is somewhat clear on higher levels of abstraction (e.g., if my brain goes "realize I want to describe a concept --> visualize several explanations and judge each for suitability --> pick the one that seems the best --> send a signal to start typing it down", then this would be a kind of search/optimization-thingy). But on the level of physics, I don't really know what an architecture means. So take this with a grain of salt.

Maybe the term "physical structure" is misleading. The thing I was trying to point at is the distinction between being able to accurately model Y using model X, and Y actually being X. In the sense that there might be a giant look-up table (GLUT) that accuractly predicts your behavior, but on no level of abstraction is it correct to say that you actually are a GLUT. Whereas modelling you as having some goals, planning, etc. might be less accurate but somewhat more, hm, true. I realize this isn't very precise, but I guess you can see what I mean.

That being said, I suppose that what I meant by "optimization architecture" is, for example, a stochastic gradient descent with the emphasis on "this is the input", "this is the part of the algorithm that does the calculation", and "this is the output". An "implementation of an optimization architecture" would be...well, the atoms of your computer that perform SGD, or maybe some simple bacteria that moves in the direction where the concentration of whatever-it-likes is the highest (not that anything I know would implement precisely SGD, but still).

Ad "interesting physical structure" behind the ant-colony: If by "evolution" we mean the atoms that the world is made of, as they changed over time until your ant colony emerged...then yeah, this is a physical structure causally upstream of the ant colony, and one that is responsible for the ant colony behaving the way it does. I wouldn't say it is interesting (to me, and w.r.t. the ant colony) though, since it is totally incomprehensible to me. (But maybe "interestingness" doesn't really make sense on the level of physics, and is only relevant in relation to our abstract world-models and their understanding.)

Finally, the ideal thing a "X-like behavior ==> Y-like architecture" theorem would cash out into is a criterion that you can actually check and say with certainty that the thing will not exhibit X-like behavior. (Whether this is reasonable to hope for is another matter.) So, even if all that I have written in this comment turns out to be nonsense, getting such criterion is what we are after :-).

Deconfuse Yourself about Agency

I agree with your summary :). The claim was that humans often predict behavior by assuming that something has a particular architecture.

(And some confusions about agency seem to appear precisely because of not making the architecture/behavior distinction.)

Problems with AI debate

Intuitively, I agree that the vacation question is under-defined / has too many "right" answers. On the other hand, I can also imagine the world where you can develop some objective fun theory, or just something which actually makes the questions well-posed. And the AIs could use this fact in the debate:

Bob: "Actually, you can derive a well-defined fun theory and use it to answer this question. And then Bali clearly wins."

Alice: "There could never be any such thing!"

Bob: "Actually, there indeed is such a theory, and its central idea is [...]."

[They go on like this for a bit, and eventually, Bob wins.]

Indeed, this seems like a thing you could (by explaining that integration is a thing) if somebody tried to convince you that there is no principled way to measure the area of a circle.

However -- if true -- this only shows that there are less under-defined question than we think. The "Ministry of Ambiguity versus the Department of Clarity" fight is still very much a thing, as are the incentives to manipulate the human. And perhaps most importantly, routinely holding debates where the AI "explains to you how to think about something" seems extremely dangerous...

Deconfuse Yourself about Agency

I have a sense that (formalized) versions of A(Θ)-morphism are going to be more useful (or easier?) for the behavioral side, though it isn't really clear.

I think -morphisation is primarily useful for describing what we often mean when we say "agency". In particular, I view this as distinct from which concepts we should be thinking about in this space. (I think the promising candidates include learning that Vanessa points to in her comment, optimization, search, and the concepts in the second part of my post.)

However, I think it might also serve as a useful part of the language for describing (non) agent-like behavior. For example, we might want to SGD-morphise an ecoli bacteria independently of whether it actually implements some form of stochastic gradient descent w.r.t. the concentration of some chemicals in the environment.

You mention the distinction between agent-like architecture and agent-like behavior (which I find similar to my distinction between selection and control), but how does the concept of A(Θ)-morphism account for this distinction?

I think of agent-like architectures as something objective, or related to the territory. In contrast, agent-like behavior is something subjective, something in the map. Importantly, agent-like behavior, or the lack of it, of some is something that exists in the map of some entity (where often ).

The selection/control distinction seems related, but not quite similar to me. Am I missing something there?

Deconfuse Yourself about Agency

I am not even sure what the input/output channels of a rock are supposed to be

I guess you imagine that the input is the physical forces affecting the ball and the output is the forces the ball exerts on the environment. Obviously, this is very much not useful for anything. But it suddenly becomes non-trivial if you consider something like the billiard-ball computer (seems like a theoretical construct, not sure if anybody actually built one...but it seems like a relevant example anyway).

Deconfuse Yourself about Agency

Yep, that totally makes sense.

Observations inspired by your comment: While this shouldn't necessarily be so, it seems the particular formulations make a lot of difference when it comes to exchanging ideas. If I read your comment without the

(although maybe "intelligence" would be a better word?)

bracket, I immediatelly go "aaa, this is so wrong!". And if I substitute "intelligent" for "agent", I totally agree with it. Not sure whether this is just me, or whether it generalizes to other people.

More specifically, I agree that from the different concepts in the vicinity of "agency", "the ability to learn the environment and exploit this knowledge towards a certain goal" seems to be particularly important to AI alignment. I think the word "agency" is perhaps not well suited for this particular concept, since it comes with so many other connotations. But "intelligence" seems quite right.

Towards an Intentional Research Agenda

(I don't have much experience thinking in these terms, so maybe the question is dumb/already answered in the post. But anyway: )

Do you have some more-detailed (and stupidly explicit) examples of the intentional and algorithmic views on the same thing, and how to translate between them?

Vaniver's View on Factored Cognition

That is, I can easily see how factored cognition allows you to stick to cognitive strategies that definitely solve a problem in a safe way, but don't see how it does that and allows you to develop new cognitive strategies to solve a problem that doesn’t result in an opening for inner optimizers--not within units, but within assemblages of units.

Do you have some intuition for how inner optimizers would arise within assemblages of units, without being initiated by some unit higher in the hierarchy? Or is that what you are pointing at?