Abram Demski

Abram Demski's Comments

Realism about rationality

So, yeah, one thing that's going on here is that I have recently been explicitly going in the other direction with partial agency, so obviously I somewhat agree. (Both with the object-level anti-realism about the limit of perfect rationality, and with the meta-level claim that agent foundations research may have a mistaken emphasis on this limit.)

But I also strongly disagree in another way. For example, you lump logical induction into the camp of considering the limit of perfect rationality. And I can definitely see the reason. But from my perspective, the significant contribution of logical induction is absolutely about making rationality more bounded.

  • The whole idea of the logical uncertainty problem is to consider agents with limited computational resources.
  • Logical induction in particular involves a shift in perspective, where rationality is not an ideal you approach but rather directly about how you improve. Logical induction is about asymptotically approximating coherence in a particular way as opposed to other ways.

So to a large extent I think my recent direction can be seen as continuing a theme already present -- perhaps you might say I'm trying to properly learn the lesson of logical induction.

But is this theme isolated to logical induction, in contrast to earlier MIRI research? I think not fully: Embedded Agency ties everything together to a very large degree, and embeddedness is about this kind of boundedness to a large degree.

So I think Agent Foundations is basically not about trying to take the limit of perfect rationality. Rather, we inherited this idea of perfect rationality from Bayesian decision theory, and Agent Foundations is about trying to break it down, approaching it with skepticism and trying to fit it more into the physical world.

Reflective Oracles still involve infinite computing power, and logical induction still involves massive computing power, more or less because the approach is to start with idealized rationality and try to drag it down to Earth rather than the other way around. (That model feels a bit fake but somewhat useful.)

(Generally I am disappointed by my reply here. I feel I have not adequately engaged with you, particularly on the function-vs-nature distinction. I may try again later.)

Realism about rationality

I generally like the re-framing here, and agree with the proposed crux.

I may try to reply more at the object level later.

Realism about rationality
(Another possibility is that you think that building AI the way we do now is so incredibly doomed that even though the story outlined above is unlikely, you see no other path by which to reduce x-risk, which I suppose might be implied by your other comment here.)

This seems like the closest fit, but my view has some commonalities with points 1-3 nonetheless.

(I agree with 1, somewhat agree with 2, and don't agree with 3).

It sounds like our potential cruxes are closer to point 3 and to the question of how doomed current approaches are. Given that, do you still think rationality realism seems super relevant (to your attempted steelman of my view)?

My current best argument for this position is realism about rationality; in this world, it seems like truly understanding rationality would enable a whole host of both capability and safety improvements in AI systems, potentially directly leading to a design for AGI (which would also explain the info hazards policy).

I guess my position is something like this. I think it may be quite possible to make capabilities "blindly" -- basically the processing-power heavy type of AI progress (applying enough tricks so you're not literally recapitulating evolution, but you're sorta in that direction on a spectrum). Or possibly that approach will hit a wall at some point. But in either case, better understanding would be essentially necessary for aligning systems with high confidence. But that same knowledge could potentially accelerate capabilities progress.

So I believe in some kind of knowledge to be had (ie, point #1).

Yeah, so, taking stock of the discussion again, it seems like:

  • There's a thing-I-believe-which-is-kind-of-like-rationality-realism.
  • Points 1 and 2 together seem more in line with that thing than "rationality realism" as I understood it from the OP.
  • You already believe #1, and somewhat believe #2.
  • We are both pessimistic about #3, but I'm so pessimistic about doing things without #3 that I work under the assumption anyway (plus I think my comparative advantage is contributing to those worlds).
  • We probably do have some disagreement about something like "how real is rationality?" -- but I continue to strongly suspect it isn't that cruxy.
(ETA: In my head I was replacing "evolution" with "reproductive fitness"; I don't agree with the sentence as phrased, I would agree with it if you talked only about understanding reproductive fitness, rather than also including e.g. the theory of natural selection, genetics, etc. In the rest of your comment you were talking about reproductive fitness, I don't know why you suddenly switched to evolution; it seems completely different from everything you were talking about before.)

I checked whether I thought the analogy was right with "reproductive fitness" and decided that evolution was a better analogy for this specific point. In claiming that rationality is as real as reproductive fitness, I'm claiming that there's a theory of evolution out there.

Sorry it resulted in a confusing mixed metaphor overall.

But, separately, I don't get how you're seeing reproductive fitness and evolution as having radically different realness, such that you wanted to systematically correct. I agree they're separate questions, but in fact I see the realness of reproductive fitness as largely a matter of the realness of evolution -- without the overarching theory, reproductive fitness functions would be a kind of irrelevant abstraction and therefore less real.

To my knowledge, the theory of evolution (ETA: mathematical understanding of reproductive fitness) has not had nearly the same impact on our ability to make big things as (say) any theory of physics. The Rocket Alignment Problem explicitly makes an analogy to an invention that required a theory of gravitation / momentum etc. Even physics theories that talk about extreme situations can enable applications; e.g. GPS would not work without an understanding of relativity. In contrast, I struggle to name a way that evolution(ETA: insights based on reproductive fitness) affects an everyday person (ignoring irrelevant things like atheism-religion debates). There are lots of applications based on an understanding of DNA, but DNA is a "real" thing. (This would make me sympathetic to a claim that rationality research would give us useful intuitions that lead us to discover "real" things that would then be important, but I don't think that's the claim.)

I think this is due more to stuff like the relevant timescale than the degree of real-ness. I agree real-ness is relevant, but it seems to me that the rest of biology is roughly as real as reproductive fitness (ie, it's all very messy compared to physics) but has far more practical consequences (thinking of medicine). On the other side, astronomy is very real but has few industry applications. There are other aspects to point at, but one relevant factor is that evolution and astronomy study things on long timescales.

Reproductive fitness would become very relevant if we were sending out seed ships to terraform nearby planets over geological time periods, in the hope that our descendants might one day benefit. (Because we would be in for some surprises if we didn't understand how organisms seeded on those planets would likely evolve.)

So -- it seems to me -- the question should not be whether an abstract theory of rationality is the sort of thing which on-outside-view has few or many economic consequences, but whether it seems like the sort of thing that applies to building intelligent machines in particular!

My underlying model is that when you talk about something so "real" that you can make extremely precise predictions about it, you can create towers of abstractions upon it, without worrying that they might leak. You can't do this with "non-real" things.

Reproductive fitness does seem to me like the kind of abstraction you can build on, though. For example, the theory of kin selection is a significant theory built on top of it.

As for reaching high confidence, yeah, there needs to be a different model of how you reach high confidence.

The security mindset model of reaching high confidence is not that you have a model whose overall predictive accuracy is high enough, but rather that you have an argument for security which depends on few assumptions, each of which is individually very likely. E.G., in computer security you don't usually need exact models of attackers, and a system which relies on those is less likely to be secure.

Realism about rationality
I was thinking of the difference between the theory of electromagnetism vs the idea that there's a reproductive fitness function, but that it's very hard to realistically mathematise or actually determine what it is. The difference between the theory of electromagnetism and mathematical theories of population genetics (which are quite mathematisable but again deal with 'fake' models and inputs, and which I guess is more like what you mean?) is smaller, and if pressed I'm unsure which theory rationality will end up closer to.

[Spoiler-boxing the following response not because it's a spoiler, but because I was typing a response as I was reading your message and the below became less relevant. The end of your message includes exactly the examples I was asking for (I think), but I didn't want to totally delete my thinking-out-loud in case it gave helpful evidence about my state.]

I'm having trouble here because yes, the theory of population genetics factors in heavily to what I said, but to me reproductive fitness functions (largely) inherit their realness from the role they play in population genetics. So the two comparisons you give seem not very different to me. The "hard to determine what it is" from the first seems to lead directly to the "fake inputs" from the second.

So possibly you're gesturing at a level of realness which is "how real fitness functions would be if there were not a theory of population genetics"? But I'm not sure exactly what to imagine there, so could you give a different example (maybe a few) of something which is that level of real?

Separately, I feel weird having people ask me about why things are 'cruxy' when I didn't initially say that they were and without the context of an underlying disagreement that we're hashing out. Like, either there's some misunderstanding going on, or you're asking me to check all the consequences of a belief that I have compared to a different belief that I could have, which is hard for me to do.

Ah, well. I interpreted this earlier statement from you as a statement of cruxiness:

If I didn't believe the above, I'd be less interested in things like AIXI and reflective oracles. In general, the above tells you quite a bit about my 'worldview' related to AI.

And furthermore the list following this:

Searching for beliefs I hold for which 'rationality realism' is crucial by imagining what I'd conclude if I learned that 'rationality irrealism' was more right:

So, yeah, I'm asking you about something which you haven't claimed is a crux of a disagreement which you and I are having, but, I am asking about it because I seem to have a disagreement with you about (a) whether rationality realism is true (pending clarification of what the term means to each of us), and (b) whether rationality realism should make a big difference for several positions you listed.

I confess to being quite troubled by AIXI's language-dependence and the difficulty in getting around it. I do hope that there are ways of mathematically specifying the amount of computation available to a system more precisely than "polynomial in some input", which should be some input to a good theory of bounded rationality.

Ah, so this points to a real and large disagreement between us about how subjective a theory of rationality should be (which may be somewhat independent of just how real rationality is, but is related).

I think I was imagining an alternative world where useful theories of rationality could only be about as precise as theories of liberalism, or current theories about why England had an industrial revolution when it did, and no other country did instead.

Ok. Taking this as the rationality irrealism position, I would disagree with it, and also agree that it would make a big difference for the things you said rationality-irrealism would make a big difference for.

So I now think we have a big disagreement around point "a" (just how real rationality is), but maybe not so much around "b" (what the consequences are for the various bullet points you listed).

Realism about rationality
Although in some sense I also endorse the "strawman" that rationality is more like momentum than like fitness (at least some aspects of rationality).

How so?

I think that ricraz claims that it's impossible to create a mathematical theory of rationality or intelligence, and that this is a crux, not so? On the other hand, the "momentum vs. fitness" comparison doesn't make sense to me.

Well, it's not entirely clear. First there is the "realism" claim, which might even be taken in contrast to mathematical abstraction; EG, "is IQ real, or is it just a mathematical abstraction"? But then it is clarified with the momentum vs fitness test, which makes it seem like the question is the degree to which accurate mathematical models can be made (where "accurate" means, at least in part, helpfulness in making real predictions).

So the idea seems to be that there's a spectrum with physics at one extreme end. I'm not quite sure what goes at the other extreme end. Here's one possibility:

  • Physics
  • Chemistry
  • Biology
  • Psychology
  • Social Sciences
  • Humanities

A problem I have is that (almost) everything on the spectrum is real. Tables and chairs are real, despite not coming with precise mathematical models. So (arguably) one could draw two separate axes, "realness" vs "mathematical modelability". Well, it's not clear exactly what that second axis should be.

Anyway, to the extent that the question is about how mathematically modelable agency is, I do think it makes more sense to expect "reproductive fitness" levels rather than "momentum" levels.

Hmm, actually, I guess there's a tricky interpretational issue here, which is what it means to model agency exactly.

  • On the one hand, I fully believe in Eliezer's idea of understanding rationality so precisely that you could make it out of pasta and rubber bands (or whatever). IE, at some point we will be able to build agents from the ground up. This could be seen as an entirely precise mathematical model of rationality.
  • But the important thing is a theoretical understanding sufficient to understand the behavior of rational agents in the abstract, such that you could predict in broad strokes what an agent would do before building and running it. This is a very different matter.

I can see how Ricraz would read statements of the first type as suggesting very strong claims of the second type. I think models of the second type have to be significantly more approximate, however. EG, you cannot be sure of exactly what a learning system will learn in complex problems.

Realism about rationality
ETA: I also have a model of you being less convinced by realism about rationality than others in the "MIRI crowd"; in particular, selection vs. control seems decidedly less "realist" than mesa-optimizers (which didn't have to be "realist", but was quite "realist" the way it was written, especially in its focus on search).

Just a quick reply to this part for now (but thanks for the extensive comment, I'll try to get to it at some point).

It makes sense. My recent series on myopia also fits this theme. But I don't get much* push-back on these things. Some others seem even less realist than I am. I see myself as trying to carefully deconstruct my notions of "agency" into component parts that are less fake. I guess I do feel confused why other people seem less interested in directly deconstructing agency the way I am. I feel somewhat like others kind of nod along to distinctions like selection vs control but then go back to using a unitary notion of "optimization". (This applies to people at MIRI and also people outside MIRI.)

*The one person who has given me push-back is Scott.

Realism about rationality

How critical is it that rationality is as real as electromagnetism, rather than as real as reproductive fitness? I think the latter seems much more plausible, but I also don't see why the distinction should be so cruxy.

My suspicion is that Rationality Realism would have captured a crux much more closely if the line weren't "momentum vs reproductive fitness", but rather, "momentum vs the bystander effect" (ie, physics vs social psychology). Reproductive fitness implies something that's quite mathematizable, but with relatively "fake" models -- e.g., evolutionary models tend to assume perfectly separated generations, perfect mixing for breeding, etc. It would be absurd to model the full details of reality in an evolutionary model, although it's possible to get closer and closer.

I think that's more the sort of thing I expect for theories of agency! I am curious why you expect electromagnetism-esque levels of mathematical modeling. Even AIXI describes a heavy dependence on programming language. Any theory of bounded rationality which doesn't ignore poly-time differences (ie, anything "closer to the ground" than logical induction) has to be hardware-dependent as well.

If I didn't believe the above,

What alternative world are you imagining, though?

Realism about rationality

I didn't like this post. At the time, I didn't engage with it very much. I wrote a mildly critical comment (which is currently the top-voted comment, somewhat to my surprise) but I didn't actually engage with the idea very much. So it seems like a good idea to say something now.

The main argument that this is valuable seems to be: this captures a common crux in AI safety. I don't think it's my crux, and I think other people who think it is their crux are probably mistaken. So from my perspective it's a straw-man of the view it's trying to point at.

The main problem is the word "realism". It isn't clear exactly what it means, but I suspect that being really anti-realist about rationality would not shift my views about the importance of MIRI-style research that much.

I agree that there's something kind of like rationality realism. I just don't think this post successfully points at it.

Ricraz starts out with the list: momentum, evolutionary fitness, intelligence. He says that the question (of rationality realism) is whether fitness is more like momentum or more like fitness. Momentum is highly formalizable. Fitness is a useful abstraction, but no one can write down the fitness function for a given organism. If pressed, we have to admit that it does not exist: every individual organism has what amounts to its own different environment, since it has different starting conditions (nearer to different food sources, etc), and so, is selected on different criteria.

So as I understand it, the claim is that the MIRI cluster believes rationality is more like momentum, but many outside the MIRI cluster believe it's more like fitness.

It seems to me like my position, and the MIRI-cluster position, is (1) closer to "rationality is like fitness" than "rationality is like momentum", and (2) doesn't depend that much on the difference. Realism about rationality is important to the theory of rationality (we should know what kind of theoretical object rationality is), but not so important for the question of whether we need to know about rationality. (This also seems supported by the analogy -- evolutionary biologists still see fitness as a very important subject, and don't seem to care that much about exactly how real the abstraction is.)

To the extent that this post has made a lot of people think that rationality realism is an important crux, it's quite plausible to me that it's made the discussion worse.

To expand more on (1) -- since it seems a lot of people found its negation plausible -- it seems like if there's an analogue for the theory of evolution, which uses relatively unreal concepts like "fitness" to help us understand rational agency, we'd like to know about it. In this view, MIRI-cluster is essentially saying "biologists should want to invent evolution. Look at all the similarities across different animals. Don't you want to explain that?" Whereas the non-MIRI cluster is saying "biologists don't need to know about evolution."

What are we assuming about utility functions?

Yeah, I think something like this is pretty important. Another reason is that humans inherently don't like to be told, top-down, that X is the optimal solution. A utilitarian AI might redistribute property forcefully, where a pareto-improving AI would seek to compensate people.

An even more stringent requirement which seems potentially sensible: only pareto-improvements which both parties both understand and endorse. (IE, there should be something like consent.) This seems very sensible with small numbers of people, but unfortunately, seems infeasible for large numbers of people (given the way all actions have side-effects for many many people).

What are we assuming about utility functions?

I didn't reply to this originally, probably because I think it's all pretty reasonable.

That's why I distinguished between the hypotheses of "human utility" and CEV. It is my vague understanding (and I could be wrong) that some alignment researchers see it as their task to align AGI with current humans and their values, thinking the "extrapolation" less important or that it will take care of itself, while others consider extrapolation an important part of the alignment problem.

My thinking on this is pretty open. In some sense, everything is extrapolation (you don't exactly "currently" have preferences, because every process is expressed through time...). But OTOH there may be a strong argument for doing as little extrapolation as possible.

My intuitions tend to agree, but I'm also inclined to ask "why not?" e.g. even if my preferences are absurdly cyclical, but we get AGI to imitate me perfectly (or me + faster thinking + more information)

Well, imitating you is not quite right. (EG, the now-classic example introduced with the CIRL framework: you want the AI to help you make coffee, not learn to drink coffee itself.) Of course maybe it is imitating you at some level in its decision-making, like, imitating your way of judging what's good.

under what sense of the word is it "unaligned" with me?

I'm thinking things like: will it disobey requests which it understands and is capable of? Will it fight you? Not to say that those things are universally wrong to do, but they could be types of alignment we're shooting for, and inconsistencies do seem to create trouble there. Presumably if we know that it might fight us, we would want to have some kind of firm statement about what kind of "better" reasoning would make it do so (e.g., it might temporarily fight us if we were severely deluded in some way, but we want pretty high standards for that).

Load More