All of adamShimi's Comments + Replies

The Natural Abstraction Hypothesis: Implications and Evidence

Thanks for the post! Two general points I want to make before going into more general comments:

  • I liked the section on concepts difference across, and hadn't thought much about it before, so thanks!
  • One big aspect of the natural abstraction hypothesis that you missed IMO is "how do you draw the boundaries around abstractions?" — more formally how do you draw the markov blanket. This to me is the most important question to answer for settling the NAH, and John's recent work on sequences of markov blanket is IMO him trying to settle this.

In general, we should

... (read more)
Robustness to Scale

Rereading this post while thinking about the approximations that we make in alignment, two points jump at me:

  • I'm not convinced that robustness to relative scale is as fundamental as the other two, because there is no reason to expect that in general the subcomponents will be significantly different in power, especially in settings like adversarial training where both parts are trained according to the same approach. That being said, I still agree that this is an interesting question to ask, and some proposal might indeed depend on a version of this.
  • Robustn
... (read more)
Reply to Eliezer on Biological Anchors

Thanks for pushing back on my interpretation.

I feel like you're using "strongest" and "weakest" to design "more concrete" and "more abstract", with maybe the value judgement (implicit in your focus on specific testable claims) that concreteness is better. My interpretation doesn't disagree with your point about Bio Anchors, it simply says that this is a concrete instantiation of a general pattern, and that the whole point of the original post as I understand it is to share this pattern. Hence the title who talks about all biology-inspired timelines, the th... (read more)

My Overview of the AI Alignment Landscape: Threat Models

Thanks so much for the effort your putting in this work! It looks particularly relevant to my current interest of understanding the different approximations and questions used in alignment, and what forbids us the Grail of paradigmaticity.

Here is my more concrete feedback

A common approach when setting research agendas in AI Alignment is to be specific, and focus on a threat model. That is, to extrapolate from current work in AI and our theoretical understanding of what to expect, to come up with specific stories for how AGI could cause an existential catas

... (read more)
Reply to Eliezer on Biological Anchors

Thanks for this post!

That being said, my model of Yudkowsky, which I built by spending time interpreting and reverse engineering the post you're responding to, feels like you're not addressing his points (obviously, I might have missed the real Yudkowsky's point)

My interpretation is that he is saying that Evolution (as the generator of most biological anchors) explores the solution space in a fundamentally different path than human research.  So what you have is two paths through a space. The burden of proof for biological anchors thus lies in arguing... (read more)

7jacob_cannell1moIt displeases me that this is currently the most upvoted response: I believe you are focusing on EY's weakest rather than strongest points. It's hardly surprising there are 'two paths through a space' - if you reran either (biological or cultural/technological) evolution with slightly different initial conditions you'd get a different path. However technological evolution is aware of biological evolution and thus strongly correlated to and influenced by it. IE deep learning is in part brain reverse engineering (explicitly in the case of DeepMind, but there are many other examples). The burden proof is thus arguably more opposite of what you claim (EY claims). To the extent EY makes specific testable claims about the inefficiency of biology, those claims are in err [https://www.lesswrong.com/posts/ax695frGJEzGxFBK4/?commentId=xpgK8Qnrn8zFM87XD] - or at least easily contestable. EY' strongest point is that the Bio Anchors framework puts far too much weight on scaling of existing models (ie transformers) to AGI, rather than modeling improvement in asymptotic scaling itself. GPT-3 and similar model scaling is so obviously inferior to what is probably possible today - let alone what is possible in the near future - that it should be given very little consideration/weight, just as it would be unwise to model AGI based on scaling up 2005 DL tech.
Biology-Inspired AGI Timelines: The Trick That Never Works

First, I want to clarify that I feel we're going into a more interesting place, where there's a better chance that you might find a point that invalidates Yudkowsky's argument, and can thus convince him of the value of the model.

But it's also important to realize that IMO, Yudkowsky is not just saying that biological anchors are bad. The more general problem (which is also developed in this post) is that predicting the Future is really hard. In his own model of AGI timelines, the factor that is basically impossible to predict until you can make AGI is the ... (read more)

Biology-Inspired AGI Timelines: The Trick That Never Works

I do think you are misconstruing Yudkowsky's argument. I'm going to give evidence (all of which are relatively strong IMO) in order of "ease of checkability". So I'll start with something anyone can check in a couple of minutes, and close by the more general interpretation that requires rereading the post in details.

Evidence 1: Yudkowsky flags Simulated-Eliezer as talking smack in the part you're mentioning

If I follow you correctly, your interpretation mostly comes from this part:

OpenPhil:  We did already consider that and try to take it into account:

... (read more)
Biology-Inspired AGI Timelines: The Trick That Never Works

Here I think I share your interpretation of Yudkowsky; I just disagree with Yudkowsky. I agree on the second part; the model overestimates median TAI arrival time. But I disagree on the first part -- I think that having a probability distribution over when to expect TAI / AGI / AI-PONR etc. is pretty important/decision-relevant, e.g. for advising people on whether to go to grad school, or for deciding what sort of research project to undertake. (Perhaps Yudkowsky agrees with this much.) 

Hum, I would say Yudkowsky seems to agree with the value of a pro... (read more)

4Daniel Kokotajlo1moI guess I would say: Ajeya's framework/model can incorporate this objection; this isn't a "get rid of the whole framework" objection but rather a "tweak the model in the following way" objection. Like, I agree that it would be bad if everyone who used Ajeya's model had to put 100% of their probability mass into the six bio anchors she chose. That's super misleading/biasing/ignores loads of other possible ways AGI might happen. But I don't think of this as a necessary part of Ajeya's model; when I use it, I throw out the six bio anchors and just directly input my probability distribution over OOMs of compute. My distribution is informed by the bio anchors, of course, but that's not the only thing that informs it.
Biology-Inspired AGI Timelines: The Trick That Never Works

Strongly disagree with this, to the extent that I think this is probably the least cruxy topic discussed in this post, and thus the comment is as wrong as is physically possible.

Remove Platt's law, and none of the actual arguments and meta-discussions changes. It's clearly a case of Yudkowsky going for the snappy "hey, see like even your new-and-smarter report makes exactly the same estimation predicted by a random psychological law" + his own frustration with the law still applying despite expected progress.

But once again, if Platt's law was so wrong that... (read more)

5Daniel Kokotajlo1moHahaha ok, interesting! If you are right I'll take some pride in having achieved that distinction. ;) I interpreted Yudkowsky as claiming that Ajeya's model had enough free parameters that it could be made to predict a wide range of things, and that what was actually driving the 30-year prediction was a bunch of implicit biases rather than reality. Platt's Law is evidence for this claim. If it were false and e.g. the typical timelines forecast was only 10 years out, or 60, then we would have less reason to think that implicit biases were driving Ajeya's choice of parameters. Of course, Yudkowsky also made other arguments besides this one... but this one seemed to be there, and it seemed fairly important to me. It's entirely possible I am misconstruing Yudkowsky's argument... you did recently do a reconstruction, so you probably understand it better than me. Care to elaborate?
Biology-Inspired AGI Timelines: The Trick That Never Works

I do agree that the halving-of-compute-costs-every-2.5-years estimate seems too slow to me; it seems like that's the rate of "normal incremental progress" but that when you account for the sort of really important ideas (or accumulations of ideas, or shifts in research direction towards more fruitful paths) that happen about once a decade, the rate should be faster than that.

I don't think this is what Yudkowsky is saying at all in the post. Actually, I think he is saying the exact opposite: that 2.5 years estimate is too fast as an estimate that is suppose... (read more)

2Daniel Kokotajlo1moThanks for this comment (and the other comment below also). I think we don't really disagree that much here. I may have just poorly communicated, slash maybe I'm objecting to the way Yudkowsky said things because I read it as implying things I disagree with. That's what I think too--normal incremental progress is probably slower than 2.5-year doubling, but there's also occasional breakthrough progress which is much faster, and it all balances out to a faster-than-2.5-year-doubling, but in such a way that makes it really hard to predict, because so much hangs on whether and when breakthroughs happen. I think I just miscommunicated. Here I think I share your interpretation of Yudkowsky; I just disagree with Yudkowsky. I agree on the second part; the model overestimates median TAI arrival time. But I disagree on the first part -- I think that having a probability distribution over when to expect TAI / AGI / AI-PONR etc. is pretty important/decision-relevant, e.g. for advising people on whether to go to grad school, or for deciding what sort of research project to undertake. (Perhaps Yudkowsky agrees with this much.) And I think that Ajeya's framework is the best framework I know of for getting that distribution. I think any reasonable distribution should be formed by Ajeya's framework, or some more complicated model that builds off of it (adding more bells and whistles such as e.g. a data-availability constraint or a probability-of-paradigm-shift mechanic.). Insofar as Yudkowsky was arguing against this, and saying that we need to throw out the whole model and start from scratch with a different model, I was not convinced. (Though maybe I need to reread the post and/or your steelman summary)
Biology-Inspired AGI Timelines: The Trick That Never Works

(My comment is quite critical, but I want to make it clear that I think doing this exercise is great and important, despite my disagreement with the result of the exercise ;) )

So, for having done the same exercise, I feel that you go far too meta here. And that by doing so, you're losing most of the actual valuable meta insights of the post. I'm not necessarily saying that your interpretation doesn't fit what Yudkowsky says, but if the goal is to distill where Yudkowsky is coming from in this specific post, I feel like this comment fails.

The "trick that ne

... (read more)
Biology-Inspired AGI Timelines: The Trick That Never Works

(I'm trying to answer and clarify some of the points in the comments based on my interpretation of Yudkowsky in this post. So take the interpretations with a grain of salt, not as "exactly what Yudkowsky meant")

Progress in AI has largely been a function of increasing compute, human software research efforts, and serial time/steps. Throwing more compute at researchers has improved performance both directly and indirectly (e.g. by enabling more experiments, refining evaluation functions in chess, training neural networks, or making algorithms that work best

... (read more)
The Plan

If I imagine what my work would look like if I started out expecting reflection to be the taut constraint, then it does seem like I'd follow a path a lot more like MIRI's. So yeah, this fits.

One thing I'm still not clear about in this thread is whether you (John) would feel that progress has been made for the theory of agency if all the problems on which MIRI were instantaneously solved. Because there's a difference between saying "this is the obvious first step if you believe reflection is the taut constraint" and "solving this problem would help significantly even if reflection wan't the taut constraint".

2johnswentworth2moI expect that progress on the general theory of agency is a necessary component of solving all the problems on which MIRI has worked. So, conditional on those problems being instantly solved, I'd expect that a lot of general theory of agency came along with it. But if a "solution" to something like e.g. the Tiling Problem didn't come with a bunch of progress on more foundational general theory of agency, then I'd be very suspicious of that supposed solution, and I'd expect lots of problems to crop up when we try to apply the solution in practice. (And this is not symmetric: I would not necessarily expect such problems in practice for some more foundational piece of general agency theory which did not already have a solution to the Tiling Problem built into it. Roughly speaking, I expect we can understand e-coli agency without fully understanding human agency, but not vice-versa.)
On Solving Problems Before They Appear: The Weird Epistemologies of Alignment

Thanks for explaining your point in more details.

The type of formal methods use I am referring to was popular in the late 1970s to at least the 1990s, not sure how popular it is now. It can be summarized by Dijkstra's slogan of “designing proof and program hand in hand”.

This approach stands in complete opposite to the approach of using a theorem prover to verify existing code after it was written.

In simple cases, the designing hand-in-hand approach works as follows: you start with a mathematical specification of the properties you want the program to be wr

... (read more)
1Koen Holtman2moI disagree that AI code is orders of magnitude more complex than say the code in a web browser or modern compiler: in fact quite the opposite applies. Most modern ML algorithms are very short pieces of code. If you are willing to use somewhat abstract math where you do not write out all the hyperparameter values, you can specify everything that goes on in a deep learning algorithm in only a few lines of pseudocode. Same goes for most modern RL algorithms. I also note that while modern airplanes contain millions of lines of code, most of it is in the on-board entertainment systems. But the safety-critical subsystems in airplanes them tend to be significantly smaller in code size, and they also run air-gapped from the on-board entertainment system code. This air-gapping of course plays an important role in making the formal proofs for the safety-critical subsystems possible. But beyond these observations: the main point I am trying to get across is that I do not value formal methods as a post-hoc verification tool that should be applied to millions of lines of code, some of it spaghetti code put together via trial and error. Clearly that approach would not work very well. I value formal methods as a tool for the aligned AI code specification and design phases. On the design phase: formal methods offer me a way to create many alternative and clarifying viewpoints on what the code is doing or intending to do, viewpoints not offered by the text of the code itself. For example, formally expressed loop invariants can express much more clearly what is going on in a loop than the loop code itself. Global invariants that I can formulate for distributed system state can allow me to express more clearly how the protocols I design manage to avoid deadlock than the code itself. So the main value if the hand-in-hand method, as a code design method, is that you can develop these clarifying mathematical viewpoints, even before you start writing the code. (Not all programmers are
On Solving Problems Before They Appear: The Weird Epistemologies of Alignment

Thanks for the thoughtful comment!

You fail to mention the more important engineering strategy: one which does not rely on tinkering, but instead on logical reasoning and math to chart a straight line to your goal.

To use the obvious example, modern engineering does not design bridges by using the staple strategy of tinkering, it will use applied math and materials science to create and validate the bridge design.

From this point of view, the main difference between 'science' and 'engineering' is that science tries to understand nature: it seeks to understand

... (read more)
2Koen Holtman2moI think I will need to write two separate replies to address the points you raise. First, a more meta-level and autobiographical reply. When it comes to formal methods, I too have a good idea of what I am talking about. I did a lot of formal methods stuff in the 1990 at Eindhoven University, which at the time was a one of the places with the most formal methods in the Netherlands. When I talk about using formal methods to design stuff, I am very serious. But I guess this is not the role of formal methods you are used to, so I'll elaborate. The type of formal methods use I am referring to was popular in the late 1970s to at least the 1990s, not sure how popular it is now. It can be summarized by Dijkstra's slogan of “designing proof and program hand in hand” [https://www.cs.utexas.edu/users/EWD/transcriptions/EWD11xx/EWD1157.html]. This approach stands in complete opposite to the approach of using a theorem prover to verify existing code after it was written. In simple cases, the designing hand-in-hand approach works as follows: you start with a mathematical specification of the properties you want the program to be written to have. Then you use this specification to guide both the writing of the code and the writing of the correctness proof for the code at the same time. The writing of the next line in the proof will often tell you exactly what next lines of code you need to add. This often leads to code which much more clearly expresses what is going on. The whole code and proof writing process can often be done without even leveraging a theorem prover. In more complex cases, you first have to develop the mathematical language to write the specification in. These complex cases are of course the more fun and interesting cases. AGI safety is one such more complex case. You mention distributed computing. This is one area where the hand-in-hand style of formal methods use is particularly useful, because the method of intuitive trial and error sucks so much at wr
Yudkowsky and Christiano discuss "Takeoff Speeds"

I grimly predict that the effect of this dialogue on the community will be polarization: People who didn't like Yudkowsky and/or his views will like him / his views less, and the gap between them and Yud-fans will grow (more than it shrinks due to the effect of increased dialogue). I say this because IMO Yudkowsky comes across as angry and uncharitable in various parts of this dialogue, and also I think it was kinda a slog to get through & it doesn't seem like much intellectual progress was made here.

Strongly agree with that.

Since you agree with Yudkowksy, do you think you could strongman his position?

Yes, though I'm much more comfortable explaining and arguing for my own position than EY's. It's just that my position turns out to be pretty similar. (Partly this is independent convergence, but of course partly this is causal influence since I've read a lot of his stuff.)

There's a lot to talk about, I'm not sure where to begin, and also a proper response would be a whole research project in itself. Fortunately I've already written a bunch of it; see these two sequences.

Here are some quick high-level thoughts:

1. Begin with timelines. The best way to forec... (read more)

LCDT, A Myopic Decision Theory

Yeah, that's a subtle point.

Here we're stressing the difference between the simulator's action and the simulation's (HCH or Evan in your example) action. Obviously, if the simulation is non-myopic, then the simulation's action will depend on the long-term consequences of this action (for the goals of the simulation). But the simulator itself only cares about answering the question "what would the simulation do next?". Once again, that might mean that the simulator will think about the long term consequences of the simulation's action on the simulation's go... (read more)

Ngo and Yudkowsky on AI capability gains

Thanks for giving more details about your perspective.

Your comment is phrased as if the object-level refutations have been tried, while conveying the meta-level intuitions hasn't been tried. If anything, it's the opposite: the sequences (and to some extent HPMOR) are practically all content about how to think, whereas Yudkowsky hasn't written anywhere near as extensively on object-level AI safety.

It's not clear to me that the sequences and HPMOR are good pointers for this particular approach to theory building. I mean, I'm sure there are posts in the seque... (read more)

Ngo and Yudkowsky on AI capability gains

I'm honestly confused by this answer.

Do you actually think that Yudkowsky having to correct everyone's object-level mistakes all the time is strictly more productive and will lead faster to the meat of the deconfusion than trying to state the underlying form of the argument and theory, and then adapting it to the object-level arguments and comments?

I have trouble understanding this, because for me the outcome of the first one is that no one gets it, he has to repeat himself all the time without making the debate progress, and this is one more giant hurdle ... (read more)

Your comment is phrased as if the object-level refutations have been tried, while conveying the meta-level intuitions hasn't been tried. If anything, it's the opposite: the sequences (and to some extent HPMOR) are practically all content about how to think, whereas Yudkowsky hasn't written anywhere near as extensively on object-level AI safety.

This has been valuable for community-building, but less so for making intellectual progress - because in almost all domains, the most important way to make progress is to grapple with many object-level problems, unti... (read more)

Ngo and Yudkowsky on AI capability gains

Good point, I hadn't thought about that one.

Still, I have to admit that my first reaction is that this particular sequence seems quite uniquely in a position to increase the quality of the debate and of alignment research singlehandedly. Of course, maybe I only feel that way because it's the only one of the long list that I know of. ^^

(Another possibility I just thought of is that maybe this subsequence requires a lot of new preliminary subsequences, such that the work is far larger than you could expect from reading the words "a subsequence". Still sounds like it would be really valuable though.

3Richard Ngo2moI don't expect such a sequence to be particularly useful, compared with focusing on more object-level arguments. Eliezer says that [https://www.lesswrong.com/posts/uXn3LyA8eNqpvdoZw/preface] the largest mistake he made in writing his original sequences was that he "didn’t realize that the big problem in learning this valuable way of thinking was figuring out how to practice it, not knowing the theory". Better, I expect, to correct the specific mistakes alignment researchers are currently making, until people have enough data points to generalise better.
Ngo and Yudkowsky on AI capability gains

That's a really helpful comment (at least for me)!

But at least step one could be saying, "Wait, do these two kinds of ideas actually go into the same bucket at all?"

I'm guessing that a lot of the hidden work here and in the next steps would come from asking stuff like:

  • so I need to alter the bucket for each new idea, or does it instead fit in its current form each time?
  • does the mental act of finding that an idea fit into the bucket removes some confusion and clarifies, or is it just a mysterious answer?
  • Does the bucket become more simple and more elegant wit
... (read more)
5johnswentworth2moSounds like you should try writing it.
3Rob Bensinger2moI'ma guess that Eliezer thinks there's a long list of sequences he could write meeting these conditions, each on a different topic.
Ngo and Yudkowsky on AI capability gains

Damn. I actually think you might have provided the first clear pointer I've seen about this form of knowledge production, why and how it works, and what could break it. There's a lot to chew on in this reply, but thanks a lot for the amazing food for thought!

(I especially like that you explained the physical points and put links that actually explain the specific implication)

And I agree (tentatively) that a lot of the epistemology of science stuff doesn't have the same object-level impact. I was not claiming that normal philosophy of science was required, just that if that was not how we should evaluate and try to break the deep theory, I wanted to understand how I was supposed to do that.

Ngo and Yudkowsky on AI capability gains

That's when I understood that spatial structure is a Deep Fundamental Theory.

And it doesn't stop there. The same thing explains the structure of our roadways, blood vessels, telecomm networks, and even why the first order differential equations for electric currents, masses on springs, and water in pipes are the same.

(The exact deep structure of physical space which explains all of these is differential topology, which I think is what Vaniver was gesturing towards with "geometry except for the parallel postulate".)

Can you go into more detail here? I have d... (read more)

There's more than just differential topology going on, but it's the thing that unifies it all. You can think of differential topology as being about spaces you can divide into cells, and the boundaries of those cells. Conservation laws are naturally expressed here as constraints that the net flow across the boundary must be zero. This makes conserved quantities into resources, for which the use of is convergently minimized. Minimal structures with certain constraints are thus led to forming the same network-like shapes, obeying the same sorts of laws. (See... (read more)

Ngo and Yudkowsky on AI capability gains

This particular type of fallback-prediction is a common one in general: we have some theory which makes predictions, but "there's a phenomenon which breaks one of the modelling assumption in a way noncentral to the main theory" is a major way the predictions can fail.

That's a great way of framing it! And a great way of thinking about why these are not failures that are "worrysome" at first/in most cases.

Ngo and Yudkowsky on AI capability gains

Thanks for the thoughtful answer!

So, thermodynamics also feels like a deep fundamental theory to me, and one of the predictions it makes is "you can't make an engine more efficient than a Carnot engine." Suppose someone exhibits an engine that appears to be more efficient than a Carnot engine; my response is not going to be "oh, thermodynamics is wrong", and instead it's going to be "oh, this engine is making use of some unseen source."

My gut reaction here is that "you can't make an engine more efficient than a Carnot engine" is not the right kind of predi... (read more)

2Matthew "Vaniver" Graves2moYeah, this seems reasonable to me. I think "how could you tell that theory is relevant to this domain?" seems like a reasonable question in a way that "what predictions does that theory make?" seems like it's somehow coming at things from the wrong angle.

From my (dxu's) perspective, it's allowable for there to be "deep fundamental theories" such that, once you understand those theories well enough, you lose the ability to imagine coherent counterfactual worlds where the theories in question are false.

To use thermodynamics as an example: the first law of thermodynamics (conservation of energy) is actually a consequence of Noether's theorem, which ties conserved quantities in physics to symmetries in physical laws. Before someone becomes aware of this, it's perhaps possible for them to imagine a universe exa... (read more)

I think "deep fundamental theory" is deeper than just "powerful abstraction that is useful in a lot of domains".

Part of what makes a Deep Fundamental Theory deeper is that it is inevitably relevant for anything existing in a certain way. For example, Ramón y Cajal (discoverer of the neuronal structure of brains) wrote:

Before the correction of the law of polarization, we have thought in vain about the usefulness of the referred facts. Thus, the early emergence of the axon, or the displacement of the soma, appeared to us as unfavorable arrangements acting

... (read more)
Ngo and Yudkowsky on AI capability gains

Thanks John for this whole thread!

(Note that I only read the whole Epistemology section of this post and skimmed the rest, so I might be saying stuff that are repeated/resolved elsewhere. Please point me to the relevant parts/quotes if that's the case. ;) )

Einstein's arrogance sounds to me like an early pointer in the Sequences for that kind of thing, with a specific claim about General Relativity being that kind of theory.

That being said, I still understand Richard's position and difficulty with this whole part (or at least what I read of Richard's diffic... (read more)

To be clear, this part:

It's one of those predictions where, if it's false, then we've probably discovered something interesting - most likely some place where an organism is spending resources to do something useful which we haven't understood yet.

... is also intended as a falsifiable prediction. Like, if we go look at the anomaly and there's no new thing going on there, then that's a very big strike against expected utility theory.

This particular type of fallback-prediction is a common one in general: we have some theory which makes predictions, but "ther... (read more)

And even if I feel what you're gesturing at, this sounds/looks like you're saying "even if my prediction is false, that doesn't mean that my theory would be invalidated". 

So, thermodynamics also feels like a deep fundamental theory to me, and one of the predictions it makes is "you can't make an engine more efficient than a Carnot engine." Suppose someone exhibits an engine that appears to be more efficient than a Carnot engine; my response is not going to be "oh, thermodynamics is wrong", and instead it's going to be "oh, this engine is making use of... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

I'm interested at all in Redwood Research's latest project because it seems to offer a prospect of wandering around with our eyes open asking questions like "Well, what if we try to apply this nonviolence predicate OOD, can we figure out what really went into the 'nonviolence' predicate instead of just nonviolence?" or if it works maybe we can try training on corrigibility and see if we can start to manifest the tiniest bit of the predictable breakdowns, which might manifest in some different way.

Trying to rephrase it in my own words (which will necessaril... (read more)

are you interested in Redwood's research because it might plausibly generate alignment issues and problems that are analogous to the real problem within the safer regime and technology we have now?

It potentially sheds light on small subpieces of things that are particular aspects that contribute to the Real Problem, like "What actually went into the nonviolence predicate instead of just nonviolence?"  Much of the Real Meta-Problem is that you do not get things analogous to the full Real Problem until you are just about ready to die.

Discussion with Eliezer Yudkowsky on AGI interventions

This is an apology for the tone and the framing of the above comment (and my following answers), which have both been needlessly aggressive, status-focused and uncharitable. Underneath are still issues that matter a lot to me, but others have discussed them better (I'll provide a list of linked comments at the end of this one).

Thanks to Richard Ngo for convincing me that I actually needed to write such an apology, which was probably the needed push for me to stop weaseling around it.

So what did I do wrong? The list is pretty damning:

  • I took something about
... (read more)

Thank you for this follow-up comment Adam, I appreciate it.

Discussion with Eliezer Yudkowsky on AGI interventions

Not planning to answer more on this thread, but given how my last messages seem to have confused you, here is my last attempt of sharing my mental model (so you can flag in an answer where I'm wrong in your opinion for readers of this thread)

Also, I just checked on the publication list, and I've read or skimmed most things MIRI published since 2014 (including most newsletters and blog posts on MIRI website).

My model of MIRI is that initially, there was a bunch of people including EY who were working mostly on decision theory stuff, tiling, model theory, th... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

I would say that with slight caveats (make "decision theory and logic" a bit larger to include some more mathy stuff and make "all experimental work" a bit smaller to not includes Redwood's work), this was indeed my model.

What made me update from our discussion is the realization that I interpreted the dismissal of basically all alignment research as "this has no value whatsoever and people doing it are just pretending to care on alignment", where it should have been interpreted as something like "this is potentially interesting/new/exciting, but it doesn't look like it brings us closer to solving alignment in a significant way, hence we're still failing".

1Rob Bensinger3mo'Experimental work is categorically bad, but Redwood's work doesn't count' does not sound like a "slight caveat" to me! What does this generalization mean at all if Redwood's stuff doesn't count? (Neither, for that matter, does the difference between 'decision theory and logic' and 'all mathy stuff MIRI has ever focused on' seem like a 'slight caveat' to me -- but in that case maybe it's because I have a lot more non-logic, non-decision-theory examples in my mind that you might not be familiar with, since it sounds like you haven't read much MIRI stuff?).
Discussion with Eliezer Yudkowsky on AGI interventions

Thanks for the examples, that helps a lot.

I'm glad that I posted my inflammatory comment, if only because exchanging with you and Rob made me actually consider the question of "what is our story to success", instead of just "are we making progress/creating valuable knowledge". And the way you two have been casting it is way less aversive to me that the way EY tends to frame it. This is definitely something I want to think more about. :)

I want to leave this paragraph as social acknowledgment that you mentioned upthread that you're tired and taking a break,

... (read more)
3Ben Pace3moGlad to hear. And yeah, that’s the crux of the issue for me.
2Rob Bensinger3mo! Yay! That's really great to hear. :)
Discussion with Eliezer Yudkowsky on AGI interventions

I'm a bit worried that what instead happened is that you made a bunch of clearly-false claims about other people and gave a bunch of invalid arguments, mixed in with the feelings-stuff; and you used the content warning at the top of the message to avoid having to distinguish which parts of your long, detailed comment are endorsed or not (rather than also flagging this within the comment); and then you also ran with this in a bunch of follow-up comments that were similarly not-endorsed but didn't even have the top-of-comment disclaimer. So that I could imag

... (read more)

Thanks for adding the note! :)

I'm confused. When I say 'that's just my impression', I mean something like 'that's an inside-view belief that I endorse but haven't carefully vetted'. (See, e.g., Impression Track Records, referring to Naming Beliefs.)

Example: you said that MIRI has "contempt with experimental work and not doing only decision theory and logic".

My prior guess would have been that you don't actually, for-real believe that -- that it's not your 'impression' in the above sense, more like 'unendorsed venting/hyperbole that has a more complicated r... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

Thanks for your great comments!

One thing I want to make clear is that I'm quite aware that my comments have not been as high-quality as they should have been. As I wrote in the disclaimer, I was writing from a place of frustration and annoyance, which also implies a focus on more status-y thing. That sounded necessary to me to air out this frustration, and I think this was a good idea given the upvotes of my original post and the couple of people who messaged me to tell me that they were also annoyed.

That being said, much of what I was railing against is a... (read more)

Enjoy your rest! :)

That sounded necessary to me to air out this frustration, and I think this was a good idea given the upvotes of my original post and the couple of people who messaged me to tell me that they were also annoyed.

If you'd just aired out your frustration, framing claims about others in NVC-like 'I feel like...' terms (insofar as you suspect you wouldn't reflectively endorse them), and then a bunch of people messaged you in private to say "thank you! you captured my feelings really well", then that would seem clearly great to me.

I'm a bit worr... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

Thanks for the detailed comment!

I think one core issue here is that there are actually two debates going on. One is "how hard is the alignment problem?"; another is "how powerful are prosaic alignment techniques?" Broadly speaking, I'd characterise most of the disagreement as being on the first question. But you're treating it like it's mostly on the second question - like EY and everyone else are studying the same thing (cancer, in your metaphor) and just disagree about how to treat it.

That's an interesting separation of the problem, because I really feel... (read more)

I really feel there is more disagreement on the second question than on the first

What is this feeling based on? One way we could measure this is by asking people about how much AI xrisk there is conditional on there being no more research explicitly aimed at aligning AGIs. I expect that different people would give very different predictions.

People like Paul and Evan and more are actually going for the core problems IMO, just anchoring a lot of their thinking in current ML technologies.

Everyone agrees that Paul is trying to solve foundational problems. And ... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

Thanks for the kind answer, even if we're probably disagreeing about most points in this thread. I think message like yours really help in making everyone aware that such topics can actually be discussed publicly without big backlash.

I like the 'give more concrete feedback on specific research directions' idea, especially if it helps clarify generators for Eliezer's pessimism. If Eliezer is pessimistic about a bunch of different research approaches simultaneously, and you're simultaneously optimistic about all those approaches, then there must be some more

... (read more)
Discussion with Eliezer Yudkowsky on AGI interventions

Thanks for the pushback!

You've had a few comments along these lines in this thread, and I think this is where you're most severely failing to see the situation from Yudkowsky's point of view.

From Yudkowsky's view, explaining and justifying MIRI's work (and the processes he uses to reach such judgements more generally) was the main point of the sequences. He has written more on the topic than anyone else in the world, by a wide margin. He basically spent several years full-time just trying to get everyone up to speed, because the inductive gap was very very

... (read more)

I don't mean to say that there's critique of prosaic alignment specifically in the sequences. Rather, a lot of the generators of the Yudkowsky-esque worldview are in there. (That is how the sequences work: it's not about arguing specific ideas around alignment, it's about explaining enough of the background frames and generators that the argument becomes unnecessary. "Raise the sanity waterline" and all that.)

For instance, just the other day I ran across this:

Of this I learn the lesson:  You cannot manipulate confusion.  You cannot make clever pl

... (read more)
Discussion with Eliezer Yudkowsky on AGI interventions

That's an awesome comment, thanks!

But that point doesn't feel sufficient to argue that Eliezer's pessimism about the current state of alignment research is just a face-saving strategy his brain tricked him into adopting. (I'm not saying you claimed that it is sufficient; probably a lot of other data points are factoring into your judgment.)

I get why you take that from my rant, but that's not really what I meant. I'm more criticizing the "everything is doomed but let's not give concrete feedback to people" stance, and I think part of it comes from believing... (read more)

My impression from talking with people (but not having direct confirmation from the people who left) was far more that OpenAI was focusing the conceptual safety team on ML work and the other safety team on making sure GPT-3 was not racist, which was not the type of work they were really excited about. But I might also be totally wrong about this.

Interesting! This is quite different from the second-hand accounts I heard. (I assume we're touching different parts of the elephant.)

Discussion with Eliezer Yudkowsky on AGI interventions

(Later added disclaimer: it's a good idea to add "I feel like..." before the judgment in this comment, so that you keep in mind that I'm talking about my impressions and frustrations, rarely stating obvious facts (despite the language making it look so))

Thanks for trying to understand my point and asking me for more details. I appreciate it.

Yet I feel weird when trying to answer, because my gut reaction to your comment is that you're asking the wrong question? Also, the compression of my view to "EY's stances seem to you to be mostly distracting people fro... (read more)

Thank you for the links Adam. To clarify, the kind of argument I'm really looking for is something like the following three (hypothetical) examples.

  • Mesa-optimization is the primary threat model of unaligned AGI systems. Over the next few decades there will be a lot of companies building ML systems that create mesa-optimizers. I think it is within 5 years of current progress that we will understand how ML systems create mesa-optimizers and how to stop it.Therefore I think the current field is adequate for the problem (80%).
  • When I look at the research we're
... (read more)

Thanks for naming specific work you think is really good! I think it's pretty important here to focus on the object-level. Even if you think the goodness of these particular research directions isn't cruxy (because there's a huge list of other things you find promising, and your view is mainly about the list as a whole rather than about any particular items on it), I still think it's super important for us to focus on object-level examples, since this will probably help draw out what the generators for the disagreement are.

John Wentworth’s Natural Abstract

... (read more)

From testimonials by a bunch of more ML people and how any discussion of alignment needs to clarify that you don’t share MIRI’s contempt with experimental work and not doing only decision theory and logic

If you were in the situation described by The Rocket Alignment Problem, you could think "working with rockets right now isn't useful, we need to focus on our conceptual confusions about more basic things" without feeling inherently contemptuous of experimentalism -- it's a tool in the toolbox (which may or may not be appropriate to the task at hand), not a... (read more)

I'm sympathetic to most of your points.

highly veiled contempt for anyone not doing that

I have sympathy for the "this feels somewhat contemptuous" reading, but I want to push back a bit on the "EY contemptuously calling nearly everyone fakers" angle, because I think "[thinly] veiled contempt" is an uncharitable reading. He could be simply exasperated about the state of affairs, or wishing people would change their research directions but respect them as altruists for Trying At All, or who knows what? I'd rather not overwrite his intentions with our reaction... (read more)

... I find that most people working on alignment are trying far harder harder to justify why they expect their work to matter than EY and the old-school MIRI team ever did.

You've had a few comments along these lines in this thread, and I think this is where you're most severely failing to see the situation from Yudkowsky's point of view.

From Yudkowsky's view, explaining and justifying MIRI's work (and the processes he uses to reach such judgements more generally) was the main point of the sequences. He has written more on the topic than anyone else in the ... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

Thanks for taking the time of asking a question about the discussion even if you lack expertise on the topic. ;)

+1 for this whole conversation, including Adam pushing back re prosaic alignment / trying to articulate disagreements! I agree that this is an important thing to talk about more.

I like the 'give more concrete feedback on specific research directions' idea, especially if it helps clarify generators for Eliezer's pessimism. If Eliezer is pessimistic about a bunch of different research approaches simultaneously, and you're simultaneously optimistic about all those approaches, then there must be some more basic disagreement(s) behind that.

From my perspective, ... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

Agreed on the track record, which is part of why that's so frustrating he doesn't give more details and feedback on why all these approaches are doomed in his view.

That being said, I disagree for the second part, probably because we don't mean the same thing by "moving the ball"?

In your bridge example, "moving the ball" looks to me like trying to see what problems the current proposal could have, how you could check them, what would be your unknown unknowns. And I definitely expect such an approach to find the problems you mention.

Maybe you could give me a better model of what you mean by "moving the ball"?

Oh, I was imagining something like "well, our current metals aren't strong enough, what if we developed stronger ones?", and then focusing on metallurgy. And this is making forward progress--you can build a taller tower out of steel than out of iron--but it's missing more fundamental issues like "you're not going to be able to drive on a bridge that's perpendicular to gravity, and the direction of gravity will change over the course of the trip" or "the moon moves relative to the earth, such that your bridge won't be able to be one object", which will sink... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

(Later added disclaimer: it's a good idea to add "I feel like..." before the judgment in this comment, so that you keep in mind that I'm talking about my impressions and frustrations, rarely stating obvious facts (despite the language making it look so))

Okay, so you're completely right that a lot of my points are logically downstream of the debate on whether Prosaic Alignment is Impossible or not. But I feel like you don't get how one sided this debate is, and how misrepresented it is here (and generally on the AF)

Like nobody except EY and a bunch of core ... (read more)

I worry that "Prosaic Alignment Is Doomed" seems a bit... off as the most appropriate crux. At least for me. It seems hard for someone to justifiably know that this is true with enough confidence to not even try anymore. To have essayed or otherwise precluded all promising paths of inquiry, to not even engage with the rest of the field, to not even try to argue other researchers out of their mistaken beliefs, because it's all Hopeless. 

Consider the following analogy: Someone who wants to gain muscle, but has thought a lot about nutrition and their gen... (read more)

I think one core issue here is that there are actually two debates going on. One is "how hard is the alignment problem?"; another is "how powerful are prosaic alignment techniques?" Broadly speaking, I'd characterise most of the disagreement as being on the first question. But you're treating it like it's mostly on the second question - like EY and everyone else are studying the same thing (cancer, in your metaphor) and just disagree about how to treat it.

My attempt to portray EY's perspective is more like: he's concerned with the problem of ageing, and a ... (read more)

3David Xu3moThanks for elaborating. I don't think I have the necessary familiarity with the alignment research community to assess your characterization of the situation, but I appreciate your willingness to raise potentially unpopular hypotheses to attention. +1
Discussion with Eliezer Yudkowsky on AGI interventions

Thanks, I sometimes forget not everyone knows the term. :)

Discussion with Eliezer Yudkowsky on AGI interventions

EDIT: This comment fails on a lot of points, as discussed in this apology subcomment. I encourage people interested by the thread to mostly read the apology subcomment and the list of comments linked there, which provide maximum value with minimum drama IMO.

Disclaimer: this is a rant. In the best possible world, I could write from a calmer place, but I’m pretty sure that the taboo on criticizing MIRI and EY too hard on the AF can only be pushed through when I’m annoyed enough. That being said, I’m writing down thoughts that I had for quite some time, so do... (read more)

This is an apology for the tone and the framing of the above comment (and my following answers), which have both been needlessly aggressive, status-focused and uncharitable. Underneath are still issues that matter a lot to me, but others have discussed them better (I'll provide a list of linked comments at the end of this one).

Thanks to Richard Ngo for convincing me that I actually needed to write such an apology, which was probably the needed push for me to stop weaseling around it.

So what did I do wrong? The list is pretty damning:

  • I took something about
... (read more)

This is already reflected in the upvotes, but just to say it explicitly: I think the replies to this comment from Rob and dxu in particular have been exceptionally charitable and productive; kudos to them. This seems like a very good case study in responding to a provocative framing with a concentration of positive discussion norms that leads to productive engagement.

if EY and other MIRI people who are very dubious of most alignment research could give more feedback on that and enter the dialogue, maybe by commenting more on the AF. My problem is not so much with them disagreeing with most of the work, it’s about the disagreement stopping to “that’s not going to work” and not having dialogue and back and forth.

Just in case anyone hasn't already seen these, EY wrote Challenges to Christiano’s capability amplification proposal and this comment (that I already linked to in a different comment on this page) (also has a rep... (read more)

Couple things:

First, there is a lot of work in the "alignment community" that involves (for example) decision theory or open-source-game-theory or acausal trade, and I haven't found any of it helpful for what I personally think about (which I'd like to think is "directly attacking the heart of the problem", but others may judge for themselves when my upcoming post series comes out!).

I guess I see this subset of work as consistent with the hypothesis "some people have been nerd-sniped!". But it's also consistent with "some people have reasonable beliefs and... (read more)

I share the impression that the agent foundations research agenda seemed not that important. But that point doesn't feel sufficient to argue that Eliezer's pessimism about the current state of alignment research is just a face-saving strategy his brain tricked him into adopting. (I'm not saying you claimed that it is sufficient; probably a lot of other data points are factoring into your judgment.) MIRI have deprioritized agent foundations research for quite a while now. I also just think it's extremely common for people to have periods where they work on ... (read more)

Adam, can you make a positive case here for how the work being done on prosaic alignment leads to success? You didn't make one, and without it I don't understand where you're coming from. I'm not asking you to tell me a story that you have 100% probability on, just what is the success story you're acting under, such that EY's stances seem to you to be mostly distracting people from the real work.

I'm annoyed by EY (and maybe MIRI's?) dismissal of every other alignment work, and how seriously it seems to be taken here, given their track record of choosing research agendas with very indirect impact on alignment

For what it's worth, my sense is that EY's track record is best in 1) identifying problems and 2) understanding the structure of the alignment problem.

And, like, I think it is possible that you end up in situations where the people who understand the situation best end up the most pessimistic about it. If you're trying to build a bridge to the ... (read more)

Similarly, the fact that they kept at it over and over with all the big improvement of DL instead of trying to adapt to prosaic Alignment sounds like evidence that they might be over attached to a specific framing, which they had trouble to discard.

I'm... confused by this framing? Specifically, this bit (as well as other bits like these)

I have to explain again and again to stressed-out newcomers that you definitely don’t need to master model theory or decision theory to do alignment, and try to steer them towards problems and questions that look like

... (read more)

Context for anyone who's not aware:

Nerd sniping is a slang term that describes a particularly interesting problem that is presented to a nerd, often a physicist, tech geek or mathematician. The nerd stops all activity to devote attention to solving the problem, often at his or her own peril

Here's the xkcd comic which coined the term.

Knowledge is not just mutual information

So, I'm trying to interpret your proposal from an epistemic strategy perspective — asking how are you trying to produce knowledge.

It sounds to me like you're proposing to start with very general formalization with simple mathematical objects (like objectivity being a sort of function, and participating in a goal increasing the measure on the states satisfying the predicate). Then, when you reach situations where the definitions are not constraining enough, like what Alex describes, you add further constraints on these objects?

I have trouble understanding h... (read more)

What exactly is GPT-3's base objective?

I feel like I expect a failure mode where people exploit ambiguity and norm-laden concepts to convince themselves of happy fairy tales. I should think more about this and write a comment.

Just wanted to point out that this is already something we need to worry about all the time in alignment. Calling them training stories doesn't create such failure mode, it makes them obvious to people like you and me who are wary of narrative explanations in science.

4Daniel Kokotajlo3moYes. I have the intuition that training stories will make this problem worse. But I don't think my intuition on this matter is trustworthy (what experience do I have to base it on?) so don't worry about it. We'll try it and see what happens. (to explain the intuition a little bit: With inner/outer alignment, any would-be AGI creator will have to face up to the fact that they haven't solved outer alignment, because it'll be easy for a philosopher to find differences between the base objective they've programmed and True Human Values. With training stories, I expect lots of people to be saying more sophisticated versions of "It just does what I meant it to do, no funny business.")
How do we become confident in the safety of a machine learning system?

This is probably one of the most important post on alignment on this forum. Seriously. I want everyone thinking about conceptual alignment, and everyone trying conceptual alignment, to read this and think about it deeply.

What this gives us is a way of combining the output of many disparate epistemic strategies to get well structured and directly relevant knowledge about alignment and how our proposals would fare. This is great, because now, we can combine many different methods of investigation (theory arguments, philosophical approaches, empirical studies... (read more)

3Evan Hubinger3moGlad you think so! I definitely agree and am planning on using this framework in my own research going forward. Yep, this is definitely intentional. I think in many ways just thinking about inner alignment as avoiding proxy-aligned mesa-optimizers can give you false confidence in your training story because you reason “of course I won't get that specific failure model”—but the problem is that you need to couple some reason that you won't get the wrong thing with some strong reason that you actually will get the right thing to really be confident in your training process's safety.
[Event] Weekly Alignment Research Coffee Time (01/24)

Really sorry, I have to recreate a link every week, and I was at EAG this week end so I completely forgot. It should work now.

P₂B: Plan to P₂B Better

Exciting! Waiting for the next posts even more then.

2Daniel Kokotajlo3moDon't get your expectations too high, haha. We haven't written the other parts yet, maybe they won't turn out to be that good.
P₂B: Plan to P₂B Better

Your proposed reformulation of convergent subgoals sounds interesting, but I see a big flaw in your post: you don't even state the applications you're doing the deconfusion for. And in my book, the applications are THE way of judging whether deconfusion is creating valuable knowledge. So I don't know yet if your framing will help with the sort of problems related to agency and goal-directedness that I think matter.

Reserving judgment until the follow up posts then.

4Daniel Kokotajlo3moFair enough; apologies. We are building to an answer to the question "What is agency and why is it powerful/competitive/incentivised/selected-for." We have a lot more to say on the subject but we decided to break it into pieces; this post is the first piece.
P₂B: Plan to P₂B Better

In theory, optimal policies could be tabularly implemented. In this case, it is impossible for them to further improve their "planning."

That sounds wrong. Planning as defined in this post is sufficiently broad that acting like a planner makes you a planner. So if you unwrap a structural planner into a tabular policy, the latter would improve its planning (for example by taking actions that instrumentally help it accomplish the goal we can best ascribe it using the intentional stance).

Another way of framing the point IMO is that the OPs define planning in t... (read more)

Load More