Thane Ruthenis

Wiki Contributions


Do you have any cached thoughts on the matter of "ontological inertia" of abstract objects? That is:

  • We usually think about abstract environments in terms of DAGs. In particular, ones without global time, and with no situations where we update-in-place a variable. A node in a DAG is a one-off.
  • However, that's not faithful to reality. In practice, objects have a continued existence, and a good abstract model should have a way to track e. g. the state of a particular human across "time"/the process of system evolution. But if "Alice" is a variable/node in our DAG, she only exists for an instant...
  • The model in this post deals with this by assuming that the entire causal structure is "copied" every timestep. So every timestep has an "Alice" variable, and  is a function of  plus some neighbours...
  • But that's not right either. Structure does change; people move around (acquire new causal neighbours and lose old ones) and are born (new variables are introduced), etc.

I think we want our model of the environment to be "flexible" in the sense that it doesn't assume the graph structure gets copied over fully every timestep, but that it has some language for talking about "ontological inertia"/one variable being an "updated version" of another variable. But I'm not quite sure how to describe this relationship.

At the bare minimum,  it has to be of same "type" as  (e. g., "human"), be directly causally connected to 's value has to be largely determined by 's value... But that's not enough, because by this definition Alice's newborn child will probably also count as Alice.

Or maybe I'm overcomplicating this, and every variable in the model would just have an "identity" signifier baked-in? Such that ?

Going up or down the abstraction levels doesn't seem to help either. ( isn't necessarily an abstraction over the same set of lower-level variables as , nor does she necessarily have the same relationship with the higher-level variables.)

Back to my question: do you have any cached thoughts on that?

A human is not well modelled as a wrapper mind; do you disagree?

Certainly agree. That said, I feel the need to lay out my broader model here. The way I see it, a "wrapper-mind" is a general-purpose problem-solving algorithm hooked up to a static value function. As such:

  • Are humans proper wrapper-minds? No, certainly not.
  • Do humans have the fundamental machinery to be wrapper-minds? Yes.
  • Is any individual run of a human general-purpose problem-solving algorithm essentially equivalent to wrapper-mind-style reasoning? Yes.
  • Can humans choose to act as wrapper-minds on longer time scales? Yes, approximately, subject to constraints like force of will.
  • Do most humans, in practice, choose to act as wrapper-minds? No, we switch our targets all the time, value drift is ubiquitous.
  • Is it desirable for a human to act as a wrapper-mind? That's complicated.
    • On the one hand, yes because consistent pursuit of instrumentally convergent goals would lead to you having more resources to spend on whatever values you have.
    • On the other hand, no because we terminally value this sort of value-drift and self-inconsistency, it's part of "being human".
    • In sum, for humans, there's a sort of tradeoff between approximating a wrapper-mind, and being an incoherent human, and different people weight it differently in different context. E. g., if you really want to achieve something (earning your first million dollars, averting extinction), and you value it more than having fun being a human, you may choose to act as a wrapper-mind in the relevant context/at the relevant scale.

As such: humans aren't wrapper-minds, but they can act like them, and it's sometimes useful to act as one.

It's not a binary. You can perform explicit optimization over high-level plan features, then hand off detailed execution to learned heuristics. "Make coffee" may be part of an optimized stratagem computed via consequentialism, but you don't have to consciously optimize every single muscle movement once you've decided on that goal.

Essentially, what counts as "outputs" or "direct actions" relative to the consequentialist-planner is flexible, and every sufficiently-reliable (chain of) learned heuristics can be put in that category, with choosing to execute one of them available to the planner algorithm as a basic output.

In fact, I'm pretty sure that's how humans work most of the time. We use the general-intelligence machinery to "steer" ourselves at a high level, and most of the time, we operate on autopilot.

I'm still not quite sure why the lightcone theorem is a "foundation" for natural abstraction (it looks to me like a nice concrete example on which you could apply techniques) 

My impression is that it being a concrete example is the why. "What is the right framework to use?" and "what is the environment-structure in which natural abstractions can be defined?" are core questions of this research agenda, and this sort of multi-layer locality-including causal model is one potential answer.

The fact that it loops-in the speed of causal influence is also suggestive — it seems fundamental to the structure of our universe, crops up in a lot of places, so the proposition that natural abstractions are somehow downstream of it is interesting.

Sure, but isn't the goal of the whole agenda to show that  does have a certain correct factorization, i. e. that abstractions are convergent?

I suppose it may be that any choice of low-level boundaries results in the same , but the  itself has a canonical factorization, and going from  back to  reveals the corresponding canonical factorization of ? And then depending on how close the initial choice of boundaries was to the "correct" one,  is easier or harder to compute (or there's something else about the right choice that makes it nice to use).

Almost. The hope/expectation is that different choices yield approximately the same , though still probably modulo some conditions (like e.g. sufficiently large ).

Can you elaborate on this expectation? Intuitively,  should consist of a number of higher-level variables as well, and each of them should correspond to a specific set of lower-level variables: abstractions and the elements they abstract over. So for a given , there should be a specific "correct" way to draw the boundaries in the low-level system.

But if ~any way of drawing the boundaries yields the same , then what does this mean?

Or perhaps the "boundaries" in the mesoscale-approximation approach represent something other than the factorization of  into individual abstractions?

Yup, that's basically it. And I agree that it's pretty obvious once you see it - the key is to notice that distance  implies that nothing other than  could have affected both of them. But man, when I didn't know that was what I should look for? Much less obvious.

... I feel compelled to note that I'd pointed out a very similar thing a while ago.

Granted, that's not exactly the same formulation, and the devil's in the details.

By the way, do we need the proof of the theorem to be quite this involved? It seems we can just note that for for any two (sets of) variables  separated by distance , the earliest sampling-step at which their values can intermingle (= their lightcones intersect) is  (since even in the "fastest" case, they can't do better than moving towards each other at 1 variable per 1 sampling-step).

Hmm. I may be currently looking at it from the wrong angle, but I'm skeptical that it's the right frame for defining abstractions. It seems to group low-level variables based on raw distance, rather than the detailed environment structure? Which seems like a very weak constraint. That is,

By further iteration, we can conclude that any number of sets of variables which are all separated by a distance of  are independent given . That’s the full Lightcone Theorem.

We can make literally any choice of those sets subject to this condition: we can draw the boundaries any way we want. Which means the abstractions we'd recover are not going to be convergent: there's a free parameter of the boundary choice.

Ah, no, I suppose that part is supposed to be handled by whatever approximation process we define for ? That is, the "correct" definition of the "most minimal approximate summary" would implicitly constrain the possible choices of boundaries for which  is equivalent to ?

The eigendecomposition/mesoscale-approximation/gKPD approaches seem like they might move in that direction, though I admit I don't see their implications at a first glance.

If we ignore the sketchy part - i.e. pretend that regions  cover all of  and are all independent given  - then gKPD would say roughly: if  can be represented as  dimensional or smaller

What's the  here? Is it meant to be ?

While it's true, there's something about making this argument that don't like. It's like it's setting you up for moving goalposts if you succeed with it? It makes it sound like the core issue is people giving AIs power, with the solution to that issue — and, implicitly, to the whole AGI Ruin thing — being to ban that.

Which is not going to help, since the sort of AGI we're worried about isn't going to need people to naively hand it power. I suppose "not proactively handing power out" somewhat raises the bar for the level of superintelligence necessary, but is that going to matter much in practice?

I expect not. Which means the natural way to assuage this fresh concern would do ~nothing to reduce the actual risk. Which means if we make this argument a lot, and get people to listen to it, and they act in response... We're then going to have to say that no, actually that's not enough, actually the real threat is AIs plotting to take control even if we're not willing to give it.

And I'm not clear on whether using the "let's at least not actively hand over power to AIs, m'kay?" argument is going to act as a foot in the door and make imposing more security easier, or whether it'll just burn whatever political capital we have on fixing a ~nonissue.

Load More