Abram Demski

Abram Demski's Comments

Bayesian Evolving-to-Extinction
Or just bad implementations do this - predict-o-matic as described sounds like a bad idea, and like it doesn't contain hypotheses, so much as "players"*. (And the reason there'd be a "side channel" is to understand theories - the point of which is transparency, which, if accomplished, would likely prevent manipulation.)

You can think of the side-channel as a "bad implementation" issue, but do you really want to say that we have to forego diagnostic logs in order to have a good implementation of "hypotheses" instead of "players"? Going to the extreme, every brain has side-channels such as EEG.

But more importantly, as Daniel K pointed out, you don't need the side-channel. If the predictions are being used in a complicated way to make decisions, the hypotheses/players have an incentive to fight each other through the consequences of those decisions.

So, the interesting question is, what's necessary for a *good* implementation of this?

This seems a strange thing to imagine - how can fighting occur, especially on a training set?

If the training set doesn't provide any opportunity for manipulation/corruption, then I agree that my argument isn't relevant for the training set. It's most directly relevant for online learning. However, keep in mind also that deep learning might be pushing in the direction of learning to learn. Something like a Memory Network is trained to "keep learning" in a significant sense. So you then have to ask if its learned learning strategy has these same issues, because that will be used on-line.

(I can almost imagine neurons passing on bad input, but a) it seems like gradient descent would get rid of that, and b) it's not clear where the "tickets" are.)

Simplifying the picture greatly, imagine that the second-back layer of neurons is one-neuron-per-ticket. Gradient descent can choose which of these to pay the most attention to, but little else; according to the lottery ticket hypothesis, the gradients passing through the 'tickets' themselves aren't doing that much for learning, besides reinforcing good tickets and weakening bad.

So imagine that there is one ticket which is actually malign, and has a sophisticated manipulative strategy. Sometimes it passes on bad input in service of its manipulations, but overall it is the best of the lottery tickets so while the gradient descent punishes it on those rounds, it is more than made up for in other cases. Furthermore, the manipulations of the malign ticket ensure that competing tickets are kept down, by manipulating situations to be those which the other tickets don't predict very well.

*I don't have a link to the claim, but it's been said before that 'the math behind Bayes' theorem requires each hypothesis to talk about all of the universe, as opposed to human models that can be domain limited.'

This remark makes me think you're thinking something about logical-induction style traders which only trade on a part of the data vs bayesian-style hypotheses which have to make predictions everywhere. I'm not sure how that relates to my post -- there are things to say about it, but, I don't think I said any of them. In particular the lottery-ticket hypothesis isn't about this; a "lottery ticket" is a small part of the deep NN, but, is effectively a hypothesis about the whole data.

Bayesian Evolving-to-Extinction

Ah right! I meant to address this. I think the results are more muddy (and thus don't serve as clear illustrations so well), but, you do get the same thing even without a side-channel.

Bayesian Evolving-to-Extinction

Yeah, in probability theory you don't have to worry about how everything is implemented. But for implementations of Bayesian modeling with a rich hypothesis class, each hypothesis could be something like a blob of code which actually does a variety of things.

As for "want", sorry for using that without unpacking it. What it specifically means is that hypotheses like that will have a tendency to get more probability weight in the system, so if we look at the weighty (and thus influential) hypotheses, they are more likely to implement strategies which achieve those ends.

Instrumental Occam?

Excellent, thanks for the comment! I really appreciate the correction. That's quite interesting.

Malign generalization without internal search

A similar borderline case is death spirals in ants. (Google it for nice pictures/videos of the phenomenon.) Ants may or may not do internal search, but regardless, it seems like this phenomenon could be reproduced without any internal search. The ants implement a search overall via a pattern of behavior distributed over many ants. This "search" behavior has a weird corner case where they literally go into a death spiral, which is quite non-obvious from the basic behavior pattern.

Instrumental Occam?

Yes, I agree with that. But (as I've said in the past) this formalism doesn't do it for me. I have yet to see something which strikes me as a compelling argument in its favor.

So in the context of planning by probabilistic inference, instrumental occam seems almost like a bug rather than a feature -- the unjustified bias toward simpler policies doesn't seem to serve a clear purpose. It's just an assumption.

Granted, the fact that I intuitively feel there should be some kind of instrumental occam is a point in favor of such methods in some sense.

Realism about rationality
If the starting point is incoherent, then this approach doesn't seem like it'll go far - if AIXI isn't useful to study, then probably AIXItl isn't either (although take this particular example with a grain of salt, since I know almost nothing about AIXItl).

Hm. I already think the starting point of Bayesian decision theory (which is even "further up" than AIXI in how I am thinking about it) is fairly useful.

  • In a naive sort of way, people can handle uncertain gambles by choosing a quantity to treat as 'utility' (such as money), quantifying probabilities of outcomes, and taking expected values. This doesn't always serve very well (e.g. one might prefer Kelley betting), but it was kind of the starting point (probability theory getting its starting point from gambling games) and the idea seems like a useful decision-making mechanism in a lot of situations.
  • Perhaps more convincingly, probability theory seems extremely useful, both as a precise tool for statisticians and as a somewhat looser analogy for thinking about everyday life, cognitive biases, etc.

AIXI adds to all this the idea of quantifying Occam's razor with algorithmic information theory, which seems to be a very fruitful idea. But I guess this is the sort of thing we're going to disagree on.

As for AIXItl, I think it's sort of taking the wrong approach to "dragging things down to earth". Logical induction simultaneously makes things computable and solves a new set of interesting problems having to do with accomplishing that. AIXItl feels more like trying to stuff an uncomputable peg into a computable hole.

Realism about rationality

So, yeah, one thing that's going on here is that I have recently been explicitly going in the other direction with partial agency, so obviously I somewhat agree. (Both with the object-level anti-realism about the limit of perfect rationality, and with the meta-level claim that agent foundations research may have a mistaken emphasis on this limit.)

But I also strongly disagree in another way. For example, you lump logical induction into the camp of considering the limit of perfect rationality. And I can definitely see the reason. But from my perspective, the significant contribution of logical induction is absolutely about making rationality more bounded.

  • The whole idea of the logical uncertainty problem is to consider agents with limited computational resources.
  • Logical induction in particular involves a shift in perspective, where rationality is not an ideal you approach but rather directly about how you improve. Logical induction is about asymptotically approximating coherence in a particular way as opposed to other ways.

So to a large extent I think my recent direction can be seen as continuing a theme already present -- perhaps you might say I'm trying to properly learn the lesson of logical induction.

But is this theme isolated to logical induction, in contrast to earlier MIRI research? I think not fully: Embedded Agency ties everything together to a very large degree, and embeddedness is about this kind of boundedness to a large degree.

So I think Agent Foundations is basically not about trying to take the limit of perfect rationality. Rather, we inherited this idea of perfect rationality from Bayesian decision theory, and Agent Foundations is about trying to break it down, approaching it with skepticism and trying to fit it more into the physical world.

Reflective Oracles still involve infinite computing power, and logical induction still involves massive computing power, more or less because the approach is to start with idealized rationality and try to drag it down to Earth rather than the other way around. (That model feels a bit fake but somewhat useful.)

(Generally I am disappointed by my reply here. I feel I have not adequately engaged with you, particularly on the function-vs-nature distinction. I may try again later.)

Realism about rationality

I generally like the re-framing here, and agree with the proposed crux.

I may try to reply more at the object level later.

Realism about rationality
(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)

This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.

(I agree with 1, somewhat agree with 2, and don't agree with 3).

It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?

My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy).

I guess my position is something like this. I think it may be quite possible to make capabilities "blindly" -- basically the processing-power heavy type of AI progress (applying enough tricks so you're not literally recapitulating evolution, but you're sorta in that direction on a spectrum). Or possibly that approach will hit a wall at some point. But in either case, better understanding would be essentially necessary for aligning systems with high confidence. But that same knowledge could potentially accelerate capabilities progress.

So I believe in some kind of knowledge to be had (ie, point #1).

Yeah, so, taking stock of the discussion again, it seems like:

  • There's a thing-I-believe-which-is-kind-of-like-rationality-realism.
  • Points 1 and 2 together seem more in line with that thing than "rationality realism" as I understood it from the OP.
  • You already believe #1, and somewhat believe #2.
  • We are both pessimistic about #3, but I'm so pessimistic about doing things without #3 that I work under the assumption anyway (plus I think my comparative advantage is contributing to those worlds).
  • We probably do have some disagreement about something like "how real is rationality?" -- but I continue to strongly suspect it isn't that cruxy.
(ETA: In my head I was replacing "evolution" with "reproductive fitness"; I don't agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don't know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)

I checked whether I thought the analogy was right with "reproductive fitness" and decided that evolution was a better analogy for this specific point. In claiming that rationality is as real as reproductive fitness, I'm claiming that there's a theory of evolution out there.

Sorry it resulted in a confusing mixed metaphor overall.

But, separately, I don't get how you're seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they're separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution -- without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.

To my knowledge, the theory of evolution (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that evolution(ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a "real" thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover "real" things that would then be important, but I don't think that's the claim.)

I think this is due more to stuff like the relevant timescale than the degree of real-ness. I agree real-ness is relevant, but it seems to me that the rest of biology is roughly as real as reproductive fitness (ie, it's all very messy compared to physics) but has far more practical consequences (thinking of medicine). On the other side, astronomy is very real but has few industry applications. There are other aspects to point at, but one relevant factor is that evolution and astronomy study things on long timescales.

Reproductive fitness would become very relevant if we were sending out seed ships to terraform nearby planets over geological time periods, in the hope that our descendants might one day benefit. (Because we would be in for some surprises if we didn't understand how organisms seeded on those planets would likely evolve.)

So -- it seems to me -- the question should not be whether an abstract theory of rationality is the sort of thing which on-outside-view has few or many economic consequences, but whether it seems like the sort of thing that applies to building intelligent machines in particular!

My underlying model is that when you talk about something so "real" that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can't do this with "non-real" things.

Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.

As for reaching high confidence, yeah, there needs to be a different model of how you reach high confidence.

The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don't usually need exact models of attackers, and a system which relies on those is less likely to be secure.

Load More