Vladimir Nesov

Comments

Saving Time

My impression is that all this time business in decision making is more of an artifact of computing solutions to constraint problems (unlike in physics, where it's actually an important concept). There is a process of computation that works with things such as propositions about the world, which are sometimes events in the physical sense, and the process goes through these events in the world in some order, often against physical time. But it's more like construction of Kleene fixpoints or some more elaborate thing like tracing statements in a control flow graph or Abstracting Abstract Machines, a particular way of solving a constraint problem that describes the situation, than anything characterising the phenomenon of finding solutions in general. Or perhaps just going up in some domain in something like Scott semantics of a computation, for whatever reason, getting more detailed information about its behavior. "The order in domains" seems like the most relevant meaning for time in decision making, which isn't a whole lot like time.

Stable Pointers to Value: An Agent Embedded in Its Own Utility Function

The problem of figuring out preference without wireheading seems very similar to the problem of maintaining factual knowledge about the world without suffering from appeals to consequences. In both cases a specialized part of agent design (model of preference or model of a fact in the world) has a purpose (accurate modeling of its referent) whose pursuit might be at odds with consequentialist decision making of the agent as a whole. The desired outcome seems to involve maintaining integrity of the specialized part, resisting corruption of consequentialist reasoning.

With this analogy, it might be possible to transfer lessons from the more familiar problem of learning facts about the world, to the harder problem of protecting preference.

Why You Should Care About Goal-Directedness

Yeah, that was sloppy of the article. In context, the quote makes a bit of sense, and the qualifier "in every detail" does useful work (though I don't see how to make the argument clear just by defining what these words mean), but without context it's invalid.

Why You Should Care About Goal-Directedness

Having an exact model of the world that contains the agent doesn't require any explicit self-references or references to the agent. For example, if there are two programs whose behavior is equivalent, A and A', and the agent correctly thinks of itself as A, then it can also know the world to be a program W(A') with some subexpressions A', but without subexpression A. To see the consequences of its actions in this world, it would be useful for the agent to figure out that A is equivalent to A', but it is not necessary that this is known to the agent from the start, so any self-reference in this setting is implicit. Also, A' can't have W(A') as a subexpression, for reasons that do admit an explanation given in the quote that started this thread, but at the same time A can have W(A') as a subexpression. What is smaller here, the world or the agent?

(What's naive Gödelian self-reference? I don't recall this term, and googling didn't help.)

Dealing with self-reference in definitions of agents and worlds does not require (or even particularly recommend) non-realizability. I don't think it's an issue specific to embedded agents, probably all puzzles that fall within this scope can be studied while requiring the world to be a finite program. It might be a good idea to look for other settings, but it's not forced by the problem statement.

non-realizability (the impossibility of the agent to contain an exact model of the world, because it is inside the world and thus smaller)

Being inside the world does not make it impossible for the agent to contain the exact model of the world, does not require non-realizability in its reasoning about the world. This is the same error as in the original quote. In what way are quines not an intuitive counterexample to this reasoning? Specifically, the error is in saying "and thus smaller". What does "smaller" mean, and how does being a part interact with it? Parts are not necessarily smaller than the whole, they can well be larger. Exact descriptions of worlds and agents are not just finite expressions, they are at least equivalence classes of expressions that behave in the same way, and elements of those equivalence classes can have vastly different syntactic size.

(Of course in some settings there are reasons for non-realizability to be necessary or to not be a problem.)

Why You Should Care About Goal-Directedness

The quote sounds like an argument for non-existence of quines or of the context in which things like the diagonalization lemma are formulated. I think it obviously sounds like this, so raising nonspecific concern in my comment above should've been enough to draw attention to this issue. It's also not a problem Agent Foundations explores, but it's presented as such. Given your background and effort put into the post this interpretation of the quote seems unlikely (which is why I didn't initially clarify, to give you the first move). So I'm confused. Everything is confusing here, including your comment above not taking the cue, positive voting on it, and negative voting on my comment. Maybe the intended meanings of "model" and "being exact" and "representation" are such that the argument makes sense and becomes related to Agent Foundations?

Why You Should Care About Goal-Directedness

Trouble comes from self-reference: since the agent is part of the world, so is its model, and thus a perfect model would need to represent itself, and this representation would need to represent itself, ad infinitum. So the model cannot be exact.

???

Vanessa Kosoy's Shortform

I agree. But GPT-3 seems to me like a good estimate for how much compute it takes to run stream of consciousness imitation learning sideloads (assuming that learning is done in batches on datasets carefully prepared by non-learning sideloads, so the cost of learning is less important). And with that estimate we already have enough compute overhang to accelerate technological progress as soon as the first amplified babbler AGIs are developed, which, as I argued above, should happen shortly after babblers actually useful for automation of human jobs are developed (because generation of stream of consciousness datasets is a special case of such a job).

So the key things to make imitation plateau last for years are either sideloads requiring more compute than it looks like (to me) they require, or amplification of competent babblers into similarly competent AGIs being a hard problem that takes a long time to solve.

Vanessa Kosoy's Shortform

I was arguing that near human level babblers (including the imitation plateau you were talking about) should quickly lead to human level AGIs by amplification via stream of consciousness datasets, which doesn't pose new ML difficulties other than design of the dataset. Superintelligence follows from that by any of the same arguments as for uploads leading to AGI (much faster technological progress; if amplification/distillation of uploads is useful straight away, we get there faster, but it's not necessary). And amplified babblers should be stronger than vanilla uploads (at least implausibly well-educated, well-coordinated, high IQ humans).

For your scenario to be stable, it needs to be impossible (in the near term) to run the AGIs (amplified babblers) faster than humans, and for the AGIs to remain less effective than very high IQ humans. Otherwise you get acceleration of technological progress, including ML. So my point is that feasibility of imitation plateau depends on absence of compute overhang, not on ML failing to capture some of the ingredients of human general intelligence.

Vanessa Kosoy's Shortform

To me this seems to be essentially another limitation of the human Internet archive dataset: reasoning is presented in an opaque way (most slow/deliberative thoughts are not in the dataset), so it's necessary to do a lot of guesswork to figure out how it works. A better dataset both explains and summarizes the reasoning (not to mention gets rid of the incoherent nonsense, but even GPT-3 can do that to an extent by roleplaying Feynman).

Any algorithm can be represented by a habit of thought (Turing machine style if you must), and if those are in the dataset, they can be learned. The habits of thought that are simple enough to summarize get summarized and end up requiring fewer steps. My guess is that the human faculties needed for AGI can be both represented by sequences of thoughts (probably just text, stream of consciousness style) and easily learned with current ML. So right now the main obstruction is that it's not feasible to build a dataset with those faculties represented explicitly that's good enough and large enough for current sample-inefficient ML to grok. More compute in the learning algorithm is only relevant for this to the extent that we get a better dataset generator that can work on the tasks before it more reliably.

Vanessa Kosoy's Shortform

This seems similar to gaining uploads prior to AGI, and opens up all those superorg upload-city amplification/distillation constructions which should get past human level shortly after. In other words, the limitations of the dataset can be solved by amplification as soon as the AIs are good enough to be used as building blocks for meaningful amplification, and something human-level-ish seems good enough for that. Maybe even GPT-n is good enough for that.

Load More