All of Vaniver's Comments + Replies

“Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments

This mostly seems to be an argument for: "It'd be nice if no pivotal act is necessary", but I don't think anyone disagrees with that.

It's arguing that, given that your organization has scary (near) AGI capabilities, it is not so much harder (to get a legitimate authority to impose an off-switch on the world's compute) than (to 'manufacture your own authority' to impose that off-switch) such that it's worth avoiding the cost of (developing those capabilities while planning to manufacture authority). Obviously there can be civilizations where that's true, and civilizations where that's not true.

Job Offering: Help Communicate Infrabayesianism

Consider applying to this now anyway; applications often can be pretty quick and there's not all that much value in delaying.

Job Offering: Help Communicate Infrabayesianism

It seems worth remembering the AXRP podcast episode on InfraBayesianism, which I think was the first time I didn't bounce off something related to this?

1Raymond Arnold2mo
I've had on my TODO to try reading the LW post transcript of that [https://www.lesswrong.com/posts/FkMPXiomjGBjMfosg/axrp-episode-5-infra-bayesianism-with-vanessa-kosoy] and seeing if it could be distilled further.
ELK prize results

Several submissions contained perspectives, tricks, or counterexamples that were new to us. We were quite happy to see so many people engaging with ELK, and we were surprised by the number and quality of submissions. 

A thing I'm curious about: what's your 'current overall view' on ELK? Is this:

  • A problem we don't know how to solve, and which we're moderately confident can't be solved (because of our success at generating counterexamples)
  • A problem we don't know how to solve, where we think rapid progress is being made (as we're still building out the ex
... (read more)

From my perspective, ELK is currently very much "A problem we don't know how to solve, where we think rapid progress is being made (as we're still building out the example-counterexample graph, and are optimistic that we'll find an example without counterexamples)" There's some question of what "rapid" means, but I think we're on track for what we wrote in the ELK doc: "we're optimistic that within a year we will have made significant progress either towards a solution or towards a clear sense of why the problem is hard."

We've spent ~9 months on the proble... (read more)

Late 2021 MIRI Conversations: AMA / Discussion

I am confused what you think I was trying to do with that intuition pump.

I think I'm confused about the intuition pump too! Like, here's some options I thought up:

  • The 'alignment problem' is really the 'not enough oversight' problem. [But then if we solve the 'enough oversight' problem, we still have to solve the 'what we want' problem, the 'coordination' problem, the 'construct competitively' problem, etc.]
  • Bits of the alignment problem can be traded off against each other, most obviously coordination and 'alignment tax' (i.e. the additional amount of work
... (read more)
2Rohin Shah2mo
I mean, maybe we should just drop this point about the intuition pump, it was a throwaway reference in the original comment. I normally use it to argue against a specific mentality I sometimes see in people, and I guess it doesn't make sense outside of that context. (The mentality is "it doesn't matter what oversight process you use, there's always a malicious superintelligence that can game it, therefore everyone dies".)
Late 2021 MIRI Conversations: AMA / Discussion

I'd say "notice that we underestimate the probability that x is even and divisible by 4 by saying it's 12.5%".

Cool, I like this example.

I agree that if you estimate a probability, and then "perform search" / "optimize" / "run n copies of the estimate" (so that you estimate the probability as 1 - (1 - P(event))^n), then you're going to have systematic errors.
...
I suspect this is not the sort of mistake you imagine me doing but I don't think I know what you do imagine me doing.

I think the thing I'm interested in is "what are our estimates of the output of se... (read more)

4Rohin Shah2mo
Re: cultured meat example: If you give me examples in which you know the features are actually inconsistent, my method is going to look optimistic when it doesn't know about that inconsistency. So yeah, assuming your description of the cultured meat example is correct, my toy model would reproduce that problem. To give a different example, consider OpenAI Five. One would think that to beat Dota, you need to have an algorithm that allows you to do hierarchical planning, state estimation from partial observability, coordination with team members, understanding of causality, compression of the giant action space, etc. Everyone looked at this giant list of necessary features and thought "it's highly improbable for an algorithm to demonstrate all of these features". My understanding is that even OpenAI, the most optimistic of everyone, thought they would need to do some sort of hierarchical RL to get this to work. In the end, it turned out that vanilla PPO with reward shaping and domain randomization was enough. It turns out that all of these many different capabilities / features were very consistent with each other and easier to achieve simultaneously than we thought. Tbc, I don't want to claim "unbiased estimator" in the mathematical sense of the phrase. To even make such a claim you need to choose some underlying probability distribution which gives rise to our features, which we don't have. I'm more saying that the direction of the bias depends on whether your features are positively vs. negatively correlated with each other and so a priori I don't expect the bias to be in a predictable direction. They definitely have that problem. I'm not sure how you don't have that problem; you're always going to have some amount of abstraction and some amount of inconsistency; the future is hard to predict for bounded humans, and you can't "fully populate the details" as an embedded agent. If you're asking how you notice any inconsistencies at all (rather than all of the inc
Late 2021 MIRI Conversations: AMA / Discussion

they download an existing chip schematic, and scale it down

Uh, how big do you think contemporary chips are?

1Donald Hobson2mo
Like 10s of atoms across. So you aren't scaling down that much. (Most of your performance gains are in being able to stack your chips or whatever.
Alex Ray's Shortform

I'm pretty sure you mean functions that perform tasks, like you would put in /utils, but I note that on LW "utility function" often refers to the decision theory concept, and "what decision theoretical utility functions are present in the neural network prior" also seems like an interesting (tho less useful) question.

Late 2021 MIRI Conversations: AMA / Discussion

I'm starting with the intuition pump, noticing I can no longer tell a good story of doom, and concluding "infinite oversight quality --> alignment solved".

I think some of my more alignment-flavored counterexamples look like:

  • The 'reengineer it to be safe' step breaks down / isn't implemented thru oversight. Like, if we're positing we spin up a whole Great Reflection to evaluate every action the AI takes, this seems like it's probably not going to be competitive!
  • The oversight gives us as much info as we ask for, but the world is a siren world (like what S
... (read more)
2Rohin Shah2mo
I obviously do not think this is at all competitive, and I also wanted to ignore the "other people steal your code" case. I am confused what you think I was trying to do with that intuition pump. I guess I said "powerful oversight would solve alignment" which could be construed to mean that powerful oversight => great future, in which case I'd change it to "powerful oversight would deal with the particular technical problems that we call outer and inner alignment", but was it really so non-obvious that I was talking about the technical problems? Maybe your point is that there are lots of things required for a good future, just as a car needs both steering and an engine, and so the intuition pump is not interesting because it doesn't talk about all the things needed for a good future? If so, I totally agree that it does not in fact include all the things needed for a good future, and it was not meant to be saying that. This just doesn't seem plausible to me. Where did the information come from? Did the AI system optimize the information to be convincing? If yes, why didn't we notice that the AI system was doing that? Can we solve this by ensuring that we do due diligence, even if it doesn't seem necessary?
Late 2021 MIRI Conversations: AMA / Discussion

Huh, why doesn't that procedure have that systematic error?

Like, when I try to naively run your steps 1-4 on "probability of there existing a number that's both even and odd", I get that about 25% of numbers should be both even and odd, so it seems pretty likely that it'll work out given that there are at least 4 numbers. But I can't easily construct an argument at a similar level of sophistication that gives me an underestimate. [Like, "probability of there existing a number that's both odd and prime" gives the wrong conclusion if you buy that the probabi... (read more)

2Rohin Shah2mo
It's the first guess. I think if you have a particular number then I'm like "yup, it's fair to notice that we overestimate the probability that x is even and odd by saying it's 25%", and then I'd say "notice that we underestimate the probability that x is even and divisible by 4 by saying it's 12.5%". I agree that if you estimate a probability, and then "perform search" / "optimize" / "run n copies of the estimate" (so that you estimate the probability as 1 - (1 - P(event))^n), then you're going to have systematic errors. I don't think I'm doing anything that's analogous to that. I definitely don't go around thinking "well, it seems 10% likely that such and such feature of the world holds, and so each alignment scheme I think of that depends on this feature has a 10% chance of working, therefore if I think of 10 alignment schemes I've solved the problem". (I suspect this is not the sort of mistake you imagine me doing but I don't think I know what you do imagine me doing.)
Late 2021 MIRI Conversations: AMA / Discussion

Also, to me it seems like a similar thing happens, but with the positions reversed, when Paul and Eliezer try to forecast concrete progress in ML over the next decade. Does that seem right to you?

It feels similar but clearly distinct? Like, in that situation Eliezer often seems to say things that I parse as "I don't have any special knowledge here", which seems like a different thing than "I can't easily sample from my distribution over how things go right", and I also have the sense of Paul being willing to 'go specific' and Eliezer not being willing to '... (read more)

Late 2021 MIRI Conversations: AMA / Discussion

Yeah, sorry about not owning that more, and for the frame being muddled. I don't endorse the "asking Eliezer" or "agreeing with Eliezer" bits, but I do basically think he's right about many object-level problems he identifies (and thus people disagreeing with him about that is not a feature) and think 'security mindset' is the right orientation to have towards AGI alignment. That hypothesis is a 'worry' primarily because asymmetric costs means it's more worth investigating than the raw probability would suggest. [Tho the raw probability of components of it... (read more)

Late 2021 MIRI Conversations: AMA / Discussion

I think my way of thinking about things is often a lot like "draw random samples," more like drawing N random samples rather than particle filtering (I guess since we aren't making observations as we go---if I notice an inconsistency the thing I do is more like backtrack and start over with N fresh samples having updated on the logical fact).

Oh whoa, you don't remember your samples from before? [I guess I might not either, unless I'm concentrating on keeping them around or verbalized them or something; probably I do something more expert-iteration-like whe... (read more)

Late 2021 MIRI Conversations: AMA / Discussion

Man, I would not call the technique you described "mainline prediction". It also seems kinda inconsistent with Vaniver's usage; his writing suggests that a person only has one mainline at a time which seems odd for this technique.

Vaniver, is this what you meant?

Uh, I inherited "mainline" from Eliezer's usage in the dialogue, and am guessing that his reasoning is following a process sort of like mine and John's. My natural word for it is a 'particle', from particle filtering, as linked in various places, which I think is consistent with John's description. ... (read more)

3Rohin Shah3mo
I don't know what "this" refers to. If the referent is "have a concrete example in mind", then I do that frequently but not always. I do it a ton when I'm not very knowledgeable and learning about a thing; I do it less as my mastery of a subject increases. (Examples: when I was initially learning addition, I used the concrete example of holding up three fingers and then counting up two more to compute 3 + 2 = 5, which I do not do any more. When I first learned recursion, I used to explicitly run through an execution trace to ensure my program would work, now I do not.) If the referent is "make statements that reflect my beliefs", then it depends on context, but in the context of these dialogues, I'm always doing that. (Whereas when I'm writing for the newsletter, I'm more often trying to represent the whole discourse, though the "opinion" sections are still entirely my beliefs.)
Late 2021 MIRI Conversations: AMA / Discussion

whatever else you might imagine would give you a "mainline".

As I understand it, when you "talk about the mainline", you're supposed to have some low-entropy (i.e. confident) view on how the future goes, such that you can answer very different questions X, Y and Z about that particular future, that are all correlated with each other, and all get (say) > 50% probability. (Idk, as I write this down, it seems so obviously a bad way to reason that I feel like I must not be understanding it correctly.)

I think this is roughly how I'm thinking about things some... (read more)

3Rohin Shah3mo
If you define "mainline" as "particle with plurality weight", then I think I was in fact "talking on my mainline" at some points during the conversation, and basically everywhere that I was talking about worlds (instead of specific technical points or intuition pumps) I was talking about "one of my top 10 particles". I think I responded to every request for concreteness with a fairly concrete answer. Feel free to ask me for more concreteness in any particular story I told during the conversation.
Late 2021 MIRI Conversations: AMA / Discussion

I'm just using this as an intuition pump for the listener to establish that a sufficiently powerful oversight process would solve AI alignment.

Huh, I guess I don't believe the intuition pump? Like, as the first counterexample that comes to mind, when I imagine having an AGI where I can tell everything about how it's thinking, and yet I remain a black box to myself, I can't really tell whether or not it's aligned to me. (Is me-now the one that I want it to be aligned to, or me-across-time? Which side of my internal conflicts about A vs. B / which principle ... (read more)

2Rohin Shah3mo
That is in fact my response. (Though one of the ways in which the intuition pump isn't fully compelling to me is that, even after understanding the exact program that the AGI implements and its causal history, maybe the overseers can't correctly predict the consequences of running that program for a long time. Still feels like they'd do fine.) I do agree that if you go as far as "logical omniscience" then there are "cheating" ways of solving the problem that don't really tell us much about how hard alignment is in practice. The car analogy just doesn't seem sensible. I can tell stories of car doom even if you have infinitely good engines (e.g. the steering breaks). My point is that we struggle to tell stories of doom when imagining a very powerful oversight process that knows everything the model knows. I'm not thinking "more oversight quality --> more alignment" and then concluding "infinite oversight quality --> alignment solved". I'm starting with the intuition pump, noticing I can no longer tell a good story of doom, and concluding "infinite oversight quality --> alignment solved". So I don't think this has much to do with extrapolating tangents vs. production functions, except inasmuch as production functions encourage you to think about complements to your inputs that you can then posit don't exist in order to tell a story of doom.
Late 2021 MIRI Conversations: AMA / Discussion

When Alice uses a model with more free parameters, you need to posit a bias before you can predict a systematic direction in which Alice will make mistakes. So this only bites you if you have a bias towards optimism.

That is, when I give Optimistic Alice fewer constraints, she can more easily imagine a solution, and when I give Pessimistic Bob fewer constraints, he can more easily imagine that no solution is possible? I think... this feels true as a matter of human psychology of problem-solving, or something, and not as a matter of math. Like, the way Bob f... (read more)

3Rohin Shah3mo
I think we're imagining different toy mathematical models. Your model, according to me: 1. There is a space of possible approaches, that we are searching over to find a solution. (E.g. the space of all possible programs.) 2. We put a layer of abstraction on top of this space, characterizing approaches by N different "features" (e.g. "is it goal-directed", "is it an oracle", "is it capable of destroying the world") 3. Because we're bounded agents, we then treat the features as independent, and search for some combination of features that would comprise a solution. I agree that this procedure has a systematic error in claiming that there is a solution when none exists (and doesn't have the opposite error), and that if this were an accurate model of how I was reasoning I should be way more worried about correcting for that problem. My model: 1. There is a probability distribution over "ways the world could be". 2. We put a layer of abstraction on top of this space, characterizing "ways the world could be" by N different "features" (e.g. "can you get human-level intelligence out of a pile of heuristics", "what are the returns to specialization", "how different will AI ontologies be from human ontologies"). We estimate the marginal probability of each of those features. 3. Because we're bounded agents, when we need the joint probability of two or more features, we treat them as independent and just multiply. 4. Given a proposed solution, we estimate its probability of working by identifying which features need to be true of the world for the solution to work, and then estimate the probability of those features (using the method above). I claim that this procedure doesn't have a systematic error in the direction of optimism (at least until you add some additional details), and that this procedure more accurately reflects the sort of reasoning that I am doing.

[I think there's a thing Eliezer does a lot, which I have mixed feelings about, which is matching people's statements to patterns and then responding to the generator of the pattern in Eliezer's head, which only sometimes corresponds to the generator in the other person's head.]

I want to add an additional meta-pattern – there was a once a person who thought I had a particular bias. They'd go around telling me "Ray, you're exhibiting that bias right now. Whatever rationalization you're coming up with right now, it's not the real reason you're arguing X." An... (read more)

Late 2021 MIRI Conversations: AMA / Discussion

I feel like I have a broad distribution over worlds and usually answer questions with probability distributions, that I have a complete mental universe (which feels to me like it outputs answers to a much broader set of questions than Eliezer's, albeit probabilistic ones, rather than bailing with "the future is hard to predict").

Sometimes I'll be tracking a finite number of "concrete hypotheses", where every hypothesis is 'fully fleshed out', and be doing a particle-filtering style updating process, where sometimes hypotheses gain or lose weight, sometimes... (read more)

I think my way of thinking about things is often a lot like "draw random samples," more like drawing N random samples rather than particle filtering (I guess since we aren't making observations as we go---if I notice an inconsistency the thing I do is more like backtrack and start over with N fresh samples having updated on the logical fact).

The main complexity feels like the thing you point out where it's impossible to make them fully fleshed out, so you build a bunch of intuitions about what is consistent (and could be fleshed out given enough time) and ... (read more)

Late 2021 MIRI Conversations: AMA / Discussion

Sorry, I probably should have been more clear about the "this is a quote from a longer dialogue, the missing context is important." I do think that the disagreement about "how relevant is this to 'actual disagreement'?" is basically the live thing, not whether or not you agree with the basic abstract point.

My current sense is that you're right that the thing you're doing is more specific than the general case (and one of the ways you can tell is the line of argumentation you give about chance of doom), and also Eliezer can still be correctly observing that... (read more)

2Rohin Shah3mo
I agree that if you have a choice about whether to have more or fewer free parameters, all else equal you should prefer the model with fewer free parameters. (Obviously, all else is not equal; in particular I do not think that Eliezer's model is tracking reality as well as mine.) When Alice uses a model with more free parameters, you need to posit a bias before you can predict a systematic direction in which Alice will make mistakes. So this only bites you if you have a bias towards optimism. I know Eliezer thinks I have such a bias. I disagree with him. I agree that this is true in some platonic sense. Either the argument gives me a correct answer, in which case I have true statements that could be cashed out in terms of mechanistic algorithms, or the argument gives me a wrong answer, in which case it wouldn't be derivable from mechanistic algorithms, because the mechanistic algorithms are the "ground truth". Quoting myself from the dialogue:
Late 2021 MIRI Conversations: AMA / Discussion

Yeah, I'm also interested in the question of "how do we distinguish 'sentences-on-mainline' from 'shoring-up-edge-cases'?", or which conversational moves most develop shared knowledge, or something similar. 

Like I think it's often good to point out edge cases, especially when you're trying to formalize an argument or look for designs that get us out of this trap. In another comment in this thread, I note that there's a thing Eliezer said that I think is very important and accurate, and also think there's an edge case that's not obviously handled corre... (read more)

(For object-level responses, see comments on parallel threads.)

I want to push back on an implicit framing in lines like:

there's some value to more people thinking thru / shooting down their own edge cases [...], instead of pushing the work to Eliezer.

people aren't updating on the meta-level point and continue to attempt 'rolling their own crypto', asking if Eliezer can poke the hole in this new procedure

This makes it sound like the rest of us don't try to break our proposals, push the work to Eliezer, agree with Eliezer when he finds a problem, and then no... (read more)

But also my sense is that there's some deep benefit from "having mainlines" and conversations that are mostly 'sentences-on-mainline'?

I agree with this. Or, if you feel ~evenly split between two options, have two mainlines and focus a bunch on those (including picking at cruxes and revising your mainline view over time).

But:

Like, it feels to me like Eliezer was generating sentences on his mainline, and Richard was responding with 'since you're being overly pessimistic, I will be overly optimistic to balance', with no attempt to have his response match his

... (read more)
Late 2021 MIRI Conversations: AMA / Discussion

The most recent post has a related exchange between Eliezer and Rohin:

Eliezer: I think the critical insight - though it has a format that basically nobody except me ever visibly invokes in those terms, and I worry maybe it can only be taught by a kind of life experience that's very hard to obtain - is the realization that any consistent reasonable story about underlying mechanisms will give you less optimistic forecasts than the ones you get by freely combining surface desiderata

Rohin: Yeah, I think I do not in fact understand why that is true for any cons

... (read more)
3Rohin Shah3mo
Note that my first response was: and my immediately preceding message was I think I was responding to the version of the argument where "freely combining surface desiderata" was swapped out with "arguments about what you're selecting for". I probably should have noted that I agreed with the basic abstract point as Eliezer stated it; I just don't think it's very relevant to the actual disagreement. I think my complaints in the context of the discussion are: * It's a very weak statement. If you freely combine the most optimistic surface desiderata, you get ~0% chance of doom. My estimate is way higher (in odds-space) than ~0%, and the statement "p(doom) >= ~0%" is not that interesting and not a justification of "doom is near-inevitable". * Relatedly, I am not just "freely combining surface desiderata". I am doing something like "predicting what properties AI systems would have by reasoning about what properties we selected for during training". I think you could reasonably ask how that compares against "predicting what properties AI systems would have by reasoning about what mechanistic algorithms could produce the behavior we observed during training". I was under the impression that this was what Eliezer was pointing at (because that's how I framed it in the message immediately prior to the one you quoted) but I'm less confident of that now.
Late 2021 MIRI Conversations: AMA / Discussion

This is mostly in response to stuff written by Richard, but I'm interested in everyone's read of the situation.

While I don't find Eliezer's core intuitions about intelligence too implausible, they don't seem compelling enough to do as much work as Eliezer argues they do. As in the Foom debate, I think that our object-level discussions were constrained by our different underlying attitudes towards high-level abstractions, which are hard to pin down (let alone resolve).

Given this, I think that the most productive mode of intellectual engagement with Eliezer'

... (read more)
5Richard Ngo3mo
To me it seems like this is what you should expect other people to look like both when other people know less about a domain than you do, and also when you're overconfident about your understanding of that domain. So I don't think it helps distinguish those two cases. (Also, to me it seems like a similar thing happens, but with the positions reversed, when Paul and Eliezer try to forecast concrete progress in ML over the next decade. Does that seem right to you?) I believe this was discussed further at some point - I argued that Eliezer-style political history books also exclude statements like "and then we survived the cold war" or "most countries still don't have nuclear energy".

EDIT: I wrote this before seeing Paul's response; hence a significant amount of repetition.

They often seem to emit sentences that are 'not absurd', instead of 'on their mainline', because they're mostly trying to generate sentences that pass some shallow checks instead of 'coming from their complete mental universe.'

Why is this?

Well, there are many boring cases that are explained by pedagogy / argument structure. When I say things like "in the limit of infinite oversight capacity, we could just understand everything about the AI system and reengineer it to... (read more)

I feel like I have a broad distribution over worlds and usually answer questions with probability distributions, that I have a complete mental universe (which feels to me like it outputs answers to a much broader set of questions than Eliezer's, albeit probabilistic ones, rather than bailing with "the future is hard to predict").  At a high level I don't think "mainline" is a great concept for describing probability distributions over the future except in certain exceptional cases (though I may not understand what "mainline" means), and that neat stor... (read more)

The most recent post has a related exchange between Eliezer and Rohin:

Eliezer: I think the critical insight - though it has a format that basically nobody except me ever visibly invokes in those terms, and I worry maybe it can only be taught by a kind of life experience that's very hard to obtain - is the realization that any consistent reasonable story about underlying mechanisms will give you less optimistic forecasts than the ones you get by freely combining surface desiderata

Rohin: Yeah, I think I do not in fact understand why that is true for any cons

... (read more)
3David Xu3mo
This is a very interesting point! I will chip in by pointing out a very similar remark from Rohin just earlier today [https://www.lesswrong.com/posts/tcCxPLBrEXdxN5HCQ/shah-and-yudkowsky-on-alignment-failures?commentId=hGvqQ3gAs5AFDg9o2#CesbvRAsvp4BNWq2H] : That is all. (Obviously there's a kinda superficial resemblance here to the phenomenon of "calling out" somebody else; I want to state outright that this is not the intention, it's just that I saw your comment right after seeing Rohin's comment, in such a way that my memory of his remark was still salient enough that the connection jumped out at me. Since salient observations tend to fade over time, I wanted to put this down before that happened.)
Christiano and Yudkowsky on AI predictions and human intelligence

I'm guessing that a proponent of Christiano's theory would say: sure, such-and-such startup succeeded but it was because they were the only ones working on problem P, so problem P was an uncrowded field at the time. Okay, but why do we draw the boundary around P rather than around "software" or around something in between which was crowded?

I'd make a different reply: you need to not just look at the winning startup, but all startups. If it's the case that the 'startup ecosystem' is earning 100% returns and the rest of the economy is earning 5% returns, the... (read more)

5Vanessa Kosoy3mo
I don't see what it has to do with risk-return. Sure, many startups fail. And, plausibly many people tried to build an airplane and failed before the Wright brothers. And, many people keep trying building AGI and failing. This doesn't mean there won't be kinks in AI progress or even a TAI created by a small group. Saying that "the subjective expected value of AI progress over time is a smooth curve" is a very different proposition from "the actual AI progress over time will be a smooth curve". My line of argument here is not trying to prove a particular story about AI progress (e.g. "TAI will be similar to a startup") but push pack against (/ voice my confusions about) the confidence level of predictions made by Christiano's model.
Christiano and Yudkowsky on AI predictions and human intelligence

That said, since I can't resist responding to random comments: are horses really being bred for sprinting as fast as they can for 20-30 seconds? (Isn't that what cheetahs are so good at?) What is the military/agricultural/trade context in which that is relevant? Who cares other than horse racers? Over any of the distances where people are using horses I would expect them to be considerably faster than cheetahs even if both are unburdened. I don't know much about horses though.

My understanding is that the primary military use of horses in Europe for elites ... (read more)

Christiano and Yudkowsky on AI predictions and human intelligence

Humans invested exorbitant amounts of money and effort into making better cheetahs, in the sense of 'trying to be able to run much faster and become the fastest creatures on earth'; we call those manufactured cheetahs, "horses".

I don't think Paul is talking about that. Consider the previous lines (which seem like they could describe animal breeding to me):

and you think that G doesn't help you improve on muscles and tendons?

until you have a big pile of it?

and Eliezer's response in the following lines:

the natural selection of cheetahs is investing in it

it's

... (read more)

I agree with your framing, and I think it shows Paul is wrong, leaving aside the specifics of the cheetah thing. Looking back, humans pursued both paths, the path of selecting cheetahs (horses) and of using G to look for completely different paradigms that blow away cheetahs. (Since we aren't evolution, we aren't restricted to picking just one approach.) And we can see the results today: when was the last time you rode a horse?

If you had invested in 'the horse economy' a century ago and bought the stock of bluechip buggywhip manufacturers instead of aerosp... (read more)

Like, fundamentally the question is something like "how efficient and accurate is the AI research market?"

I would distinguish two factors:

  • How powerful and well-directed is the field's optimization?
  • How much does the technology inherently lend itself to information asymmetries?

You could turn the "powerful and well-directed" dial up to the maximum allowed by physics, and still not thereby guarantee that information asymmetries are rare, because the way that a society applies maximum optimization pressure to reaching AGI ASAP might route through a lot of indiv... (read more)

Worst-case thinking in AI alignment

When you’re considering between a project that gives us a boost in worlds where P(doom) was 50% and projects that help out in worlds where P(doom) was 1% or 99%, you should probably pick the first project, because the derivative of P(doom) with respect to alignment progress is maximized at 50%.

Many prominent alignment researchers estimate P(doom) as substantially less than 50%. Those people often focus on scenarios which are surprisingly bad from their perspective basically for this reason.

And conversely, people who think P(doom) > 50% should aim their

... (read more)
There is essentially one best-validated theory of cognition.

Why do they separate out the auditory world and the environment?

Christiano, Cotra, and Yudkowsky on AI progress

So it looks like the R-7 (which launched Sputnik) was the first ICBM, and the range is way longer than the V-2s of ~15 years earlier, but I'm not easily finding a graph of range over those intervening years. (And the R-7 range is only about double the range of a WW2-era bomber, which further smooths the overall graph.)

[And, implicitly, the reason we care about ICBMs is because the US and the USSR were on different continents; if the distance between their major centers was comparable to England and France's distance instead, then the same strategic considerations would have been hit much sooner.]

Christiano, Cotra, and Yudkowsky on AI progress

presumably we saw a discontinuous jump in flight range when Sputnik entered orbit.

While I think orbit is the right sort of discontinuity for this, I think you need to specify 'flight range' in a way that clearly favors orbits for this to be correct, mostly because about a month before was the manhole cover launched/vaporized with a nuke.

[But in terms of something like "altitude achieved", I think Sputnik is probably part of a continuous graph, and probably not the most extreme member of the graph?]

My understanding is that Sputnik was a big discontinuous jump in "distance which a payload (i.e. nuclear bomb) can be delivered" (or at least it was a conclusive proof-of-concept of a discontinuous jump in that metric). That metric was presumably under heavy optimization pressure at the time, and was the main reason for strategic interest in Sputnik, so it lines up very well with the preconditions for the continuous view.

Yudkowsky and Christiano discuss "Takeoff Speeds"

your point is simply that it's hard to predict when that will happen when you just look at the Penn Treebank trend.

This is a big part of my point; a smaller elaboration is that it can be easy to trick yourself into thinking that, because you understand what will happen with PTB, you'll understand what will happen with economics/security/etc., when in fact you don't have much understanding of the connection between those, and there might be significant discontinuities. [To be clear, I don't have much understanding of this either; I wish I did!]

For example, ... (read more)

Yudkowsky and Christiano discuss "Takeoff Speeds"

it seems like extrapolating from the past still gives you a lot better of a model than most available alternatives.

My impression is that some people are impressed by GPT-3's capabilities, whereas your response is "ok, but it's part of the straight-line trend on Penn Treebank; maybe it's a little ahead of schedule, but nothing to write home about." But clearly you and they are focused on different metrics! 

That is, suppose it's the case that GPT-3 is the first successfully commercialized language model. (I think in order to make this literally true you... (read more)

4Matthew Barnett6mo
I think it's the nature of every product that comes on the market that it will experience a discontinuity from having zero revenue to having some revenue at some point. It's an interesting question of when that will happen, and maybe your point is simply that it's hard to predict when that will happen when you just look at the Penn Treebank trend. However, I suspect that the revenue curve will look pretty continuous, now that it's gone from zero to one. Do you disagree? In a world with continuous, gradual progress across a ton of metrics, you're going to get discontinuities from zero to one. I don't think anyone from the Paul camp disagrees with that (in fact, Katja Grace talked about this [https://aiimpacts.org/likelihood-of-discontinuous-progress-around-the-development-of-agi/#Starting_high] in her article). From the continuous takeoff perspective, these discontinuities don't seem very relevant unless going from zero to one is very important in a qualitative sense. But I would contend that going from "no revenue" to "some revenue" is not actually that meaningful in the sense of distinguishing AI from the large class of other economic products that have gradual development curves.
Ngo and Yudkowsky on AI capability gains

The mental move I'm doing for each of these examples is not imagining universes where addition/evolution/other deep theory is wrong, but imagining phenomena/problems where addition/evolution/other deep theory is not adapted. If you're describing something that doesn't commute, addition might be a deep theory, but it's not useful for what you want. 

Yeah, this seems reasonable to me. I think "how could you tell that theory is relevant to this domain?" seems like a reasonable question in a way that "what predictions does that theory make?" seems like it's somehow coming at things from the wrong angle.

Ngo and Yudkowsky on AI capability gains

And even if I feel what you're gesturing at, this sounds/looks like you're saying "even if my prediction is false, that doesn't mean that my theory would be invalidated". 

So, thermodynamics also feels like a deep fundamental theory to me, and one of the predictions it makes is "you can't make an engine more efficient than a Carnot engine." Suppose someone exhibits an engine that appears to be more efficient than a Carnot engine; my response is not going to be "oh, thermodynamics is wrong", and instead it's going to be "oh, this engine is making use of... (read more)

6Adele Lopez6mo
That's not what it predicts. It predicts you can't make a heat engine more efficient than a Carnot engine.
2Adam Shimi6mo
Thanks for the thoughtful answer! My gut reaction here is that "you can't make an engine more efficient than a Carnot engine" is not the right kind of prediction to try to break thermodynamics, because even if you could break it in principle, staying at that level without going into the detailed mechanisms of thermodynamics will only make you try the same thing as everyone else does. Do you think that's an adequate response to your point, or am I missing what you're trying to say? The mental move I'm doing for each of these examples is not imagining universes where addition/evolution/other deep theory is wrong, but imagining phenomena/problems where addition/evolution/other deep theory is not adapted. If you're describing something that doesn't commute, addition might be a deep theory, but it's not useful for what you want. Similarly, you could argue that given how we're building AIs and trying to build AGI, evolution is not the deep theory that you want to use. It sounds to me like you (and your internal-Yudkowsky) are using "deep fundamental theory" to mean "powerful abstraction that is useful in a lot of domains". Which addition and evolution fundamentally are. But claiming that the abstraction is useful in some new domain requires some justification IMO. And even if you think the burden of proof is on the critics, the difficulty of formulating the generators makes that really hard. Once again, do you think that answers your point adequately?
Ngo and Yudkowsky on AI capability gains

It's taking a massive massive failure and trying to find exactly the right abstract gloss to put on it that makes it sound like exactly the right perfect thing will be done next time.

I feel like Ngo didn't really respond to this?

Like, later he says: 

Right, I'm not endorsing this as my mainline prediction about what happens. Mainly what I'm doing here is highlighting that your view seems like one which cherrypicks pessimistic interpretations.

But... Richard, are you endorsing it as 'at all in line with the evidence?' Like, when I imagine living in that ... (read more)

5Richard Ngo6mo
I think we live in a world where there are very strong forces opposed to technological progress, which actively impede a lot of impactful work, including technologies which have the potential to be very economically and strategically important (e.g. nuclear power, vaccines, genetic engineering, geoengineering). This observation doesn't lead me to a strong prediction that all such technologies will be banned; nor even that the most costly technologies will be banned - if the forces opposed to technological progress were even approximately rational, then banning gain of function research would be one of their main priorities (although I note that they did manage to ban it, the ban just didn't stick). But when Eliezer points to covid as an example of generalised government failure, and I point to covid as also being an example of the specific phenomenon of people being very wary of new technology, I don't think that my gloss is clearly absurd. I'm open to arguments that say that serious opposition to AI progress won't be an important factor in how the future plays out; and I'm also open to arguments that covid doesn't provide much evidence that there will be serious opposition to AI progress. But I do think that those arguments need to be made.
Adele Lopez's Shortform

So we would need to figure out how to robustly get an honest signal from such an experiment, which still seems quite hard. But perhaps it's easier than solving the full alignment problem before the first shot.

IMO this is a 'additional line of defense' boxing strategy instead of simplification. 

Note that in the traditional version, the 'dud' bit of the bomb can only be the trigger; a bomb that absorbs the photon but then explodes isn't distinguishable from a bomb that absorbs the photon and then doesn't explode (because of an error deeper in the bomb).... (read more)

1Adele Lopez6mo
Thanks!
Discussion with Eliezer Yudkowsky on AGI interventions

However, I think it's not at all obvious to me that corrigibility doesn't have a "small central core". It does seem to me like the "you are incomplete, you will never be complete" angle captures a lot of what we mean by corrigibility. 

I think all three of Eliezer, you, and I share the sense that corrigibility is perhaps philosophically simple. The problem is that for it to actually have a small central core / be a natural stance, you need the 'import philosophy' bit to also have a small central core / be natural, and I think those bits aren't true.

Lik... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

Oh, I was imagining something like "well, our current metals aren't strong enough, what if we developed stronger ones?", and then focusing on metallurgy. And this is making forward progress--you can build a taller tower out of steel than out of iron--but it's missing more fundamental issues like "you're not going to be able to drive on a bridge that's perpendicular to gravity, and the direction of gravity will change over the course of the trip" or "the moon moves relative to the earth, such that your bridge won't be able to be one object", which will sink... (read more)

Discussion with Eliezer Yudkowsky on AGI interventions

Certainly, if you're working on a substantial breakthrough in AI capability, there are reasons to keep it secret. But why would you work on that in the first place?

Most of the mentions of secrecy in this post are in that context. I think a lot of people who say they care about the alignment problem think that the 'two progress bars' model, where you can work on alignment and capability independent of each other, is not correct, and so they don't see all that much of a difference between capability work and alignment work. (If you're trying to predict human... (read more)

3Vanessa Kosoy6mo
If there's no difference between capability work and alignment work, then how is it possible to influence anything at all? If capability and alignment go hand in hand, then either transformative capability corresponds to sufficient alignment (in which case there is no technical problem) or it doesn't (in which case we're doomed). The only world in which secrecy makes sense, AFAICT, is if you're going to solve alignment and capability all by yourself. I am extremely skeptical of this approach.
Discussion with Eliezer Yudkowsky on AGI interventions

I'm annoyed by EY (and maybe MIRI's?) dismissal of every other alignment work, and how seriously it seems to be taken here, given their track record of choosing research agendas with very indirect impact on alignment

For what it's worth, my sense is that EY's track record is best in 1) identifying problems and 2) understanding the structure of the alignment problem.

And, like, I think it is possible that you end up in situations where the people who understand the situation best end up the most pessimistic about it. If you're trying to build a bridge to the ... (read more)

4Adam Shimi6mo
Agreed on the track record, which is part of why that's so frustrating he doesn't give more details and feedback on why all these approaches are doomed in his view. That being said, I disagree for the second part, probably because we don't mean the same thing by "moving the ball"? In your bridge example, "moving the ball" looks to me like trying to see what problems the current proposal could have, how you could check them, what would be your unknown unknowns. And I definitely expect such an approach to find the problems you mention. Maybe you could give me a better model of what you mean by "moving the ball"?
The Codex Skeptic FAQ

[Note: I use Copilot and like it. The 'aha' moment for me was when I needed to calculate the intersection of two lines, a thing that I would normally just copy/paste from Stack Overflow, and instead Copilot wrote the function for me. Of course I then wrote tests and it passed the tests, which seemed like an altogether better workflow.]

Language models are good enough at generating code to make the very engineers building such models slightly more productive

How much of this is 'quality of code' vs. 'quality of data'? I would naively expect that the sort of a... (read more)

4Michaël Trazzi9mo
I buy that "generated code" will not add anything to the training set, and that Copilot doesn't help for having good data or (directly) better algorithms. However, the feedback loop I am pointing at is when you accept suggestions on Copilot. I think it is learning from human feedback on what solutions people select. If the model is "finetuned" to the specific dev's coding style, I would expect Codex to suggest even better code (because of high quality of finetuning data) to someone at OAI than me or you. I'm pointing at overall gains in dev's productivity. This could be used for collecting more data, which, AFAIK, happens by collecting automatically data from the internet using code (although possibly the business collaboration between OAI and github helped). Most of the dev work would then be iteratively cleaning that data, running trainings, changing the architecture, etc. before getting to the performance they'd want, and those cycles would be a tiny bit faster using such tools. To be clear, I'm not saying that talented engineers are coding much faster today. They're probably doing creative work at the edge of what Codex has seen. However, we're using the first version of something that, down the line, might end up giving us decent speed increases (I've been increasingly more productive the more I've learned how to use it). A company owning such model would certainly have private access to better versions to use internally, and there are some strategic considerations in not sharing the next version of its code generating model to win a race, while collecting feedback from millions of developers.
Extraction of human preferences 👨→🤖

Thanks for sharing negative results!

If I'm understanding you correctly, the structure looks something like this:

  • We have a toy environment where human preferences are both exactly specified and consequential.
  • We want to learn how hard it is to discover the human preference function, and whether it is 'learned by default' in an RL agent that's operating in the world and just paying attention to consequences.
  • One possible way to check whether it's 'learned by default' is to compare the performance of a predictor trained just on environmental data, a predictor t
... (read more)
3Mislav Jurić9mo
Hello Matthew, I'm Mislav, one of the team members that worked on this project. Thank you for your thoughtful comment. Yes, you understood what we did correctly. We wanted to check whether human preferences are "learned by default" by comparing the performance of a human preference predictor trained just on the environment data and a human preference predictor trained on the RL agent's internal state. As for your question related to environments, I agree with you. There are probably some environments (like the gridworld environment we used) where the human preference is too easy to learn. On other environments, the human preference is too hard to learn and then there's the golden middle. One of our team members (I think it was Riccardo) had the idea of investigating the research question which could be posed as follows: "What kinds of environments are suitable for the agent to learn human preferences by default?". As you stated, in that case it would be useful to investigate the properties (features) of the environment and make some conclusions about what characterizes the environments where the RL agent can learn human preferences by default. This is a research direction that could build up on our work here. As for your question on why and how did we choose what the human preference will be in a particular environment: to be honest, I think we were mostly guided by our intuition. Nevan and Riccardo experimented with a lot of different environment setups in the VizDoom environment. Arun and me worked on setting up the PySC2 environment, but since training the agent on the PySC2 environment demanded a lot of resources, was pretty unstable and the VizDoom environment results turned out to be negative, we decided not to experiment on other environments further. So to recap, I think that we were mostly guided by our intuition on what would be too easy, too hard or just right of a human preference to predict and we course corrected by the experimental results. Bes
Big picture of phasic dopamine

I guess my sense is that most biological systems are going to be 'package deals' instead of 'cleanly separable' as much as possible--if you already have a system that's doing learning, and you can tweak that system in order to get something that gets you some of the benefits of a VoI framework (without actually calculating VoI), I expect biology to do that.

2Steve Byrnes10mo
I agree about the general principle, even if I don't think this particular thing is an example because of the "not maximizing sum of future rewards" thing.
Big picture of phasic dopamine

But in experiments, they’re not synchronized; the former happens faster than the latter.

This has the effect of incentivizing learning, right? (A system that you don't yet understand is, in total, more rewarding than an equally yummy system that you do understand.) So it reminds me of exploration in bandit algorithms, which makes sense given the connection to motivation.

1Steve Byrnes10mo
Hmm, I guess I mostly disagree because: * I see this as sorta an unavoidable aspect of how the system works, so it doesn't really need an explanation; * You're jumping to "the system will maximize sum of future rewards" but I think RL in the brain is based on "maximize rewards for this step right now" (…and by the way "rewards for this step right now" implicitly involves an approximate assessment of future prospects.) See my comment "Humans are absolute rubbish at calculating a time-integral of reward". * I'm all for exploration, value-of-information, curiosity, etc., just not involving this particular mechanism.
AMA: Paul Christiano, alignment researcher

Is "movies" a standin for "easily duplicated cultural products", or do you think movies in particular are underproduced?

4Paul Christiano1y
Mostly a stand-in, but I do wish people were making more excellent movies :)
Can you get AGI from a Transformer?

Ah, I now suspect that I misunderstood you as well earlier: you wanted your list to be an example of "what you mean by DNN-style calculations" but I maybe interpreted as "a list of things that are hard to do with DNNs". And under that reading, it seemed unfair because the difficulty that even high-quality DNNs have in doing simple arithmetic is mirrored by the difficulty that humans have in doing simple arithmetic.

Similarly, I agree with you that there are lots of things that seem very inefficient to implement via DNNs rather than directly (like MCTS, or s... (read more)

1Steve Byrnes1y
I slightly edited that section header to make it clearer what the parenthetical "(matrix multiplications, ReLUs, etc.)" is referring to. Thanks! I agree that it's hard to make highly-confident categorical statements about all current and future DNN-ish architectures. I don't think the human planning algorithm is very much like MCTS, although you can learn to do MCTS (just like you can learn to mentally run any other algorithm—people can learn strategies about what thoughts to think, just like they can strategies about what actions to execute). I think the built-in capability is that compositional-generative-model-based processing I was talking about in this post. Like, if I tell you "I have a banana blanket", you have a constraint (namely, I just said that I have a banana blanket) and you spend a couple seconds searching through generative models until you find one that is maximally consistent with both that constraint and also all your prior beliefs about the world. You're probably imagining me with a blanket that has pictures of bananas on it, or less likely with a blanket made of banana peels, or maybe you figure I'm just being silly. So by the same token, imagine you want to squeeze a book into a mostly-full bag. You have a constraint (the book winds up in the bag), and you spend a couple seconds searching through generative models until you find one that's maximally consistent with both that constraint and also all your prior beliefs and demands about the world. You imagine a plausible way to slide the book in without ripping the bag or squishing the other content, and flesh that out into a very specific action plan, and then you pick the book up and do it. When we need a multi-step plan, too much to search for in one go, we start needing to also rely on other built-in capabilities like chunking stuff together into single units, analogical reasoning (which is really just a special case of compositional-generative-model-based processing), and RL (as mention
Can you get AGI from a Transformer?

Do you think DNNs and human brains are doing essentially the same type of information processing? If not, how did you conclude "humans can't do those either"? Thanks!

Sorry for the late reply, but I was talking from personal experience. Multiplying matrices is hard! Even for extremely tiny ones, I was sped up tremendously by pencil and paper. It was much harder than driving a car, or recognizing whether a image depicts a dog or not. Given the underlying computational complexity of the various tasks, I can only conclude that I'm paying an exorbitant performa... (read more)

1Steve Byrnes1y
Oh OK I think I misunderstood you. So the context was: I think there's an open question about the extent to which the algorithms underlying human intelligence in particular, and/or AGI more generally, can be built from operations similar to matrix multiplication (and a couple other operations). I'm kinda saying "no, it probably can't" while the scaling-is-all-you-need DNN enthusiasts are kinda saying "yes, it probably can". Then your response is that humans can't multiply matrices in their heads. Correct? But I don't think that's relevant to this question. Like, we don't have low-level access to our own brains. If you ask GPT-3 (through its API) to simulate a self-attention layer, it wouldn't do particularly well, right? So I don't think it's any evidence either way. I dunno, certainly that's possible, but also sometimes new algorithms outright replace old algorithms. Like GPT-3 doesn't have any LSTM [https://en.wikipedia.org/wiki/Long_short-term_memory] modules in it, let alone HHMM [https://en.wikipedia.org/wiki/Hierarchical_hidden_Markov_model] modules, or syntax tree modules, or GOFAI production rule modules. :-P
Updating the Lottery Ticket Hypothesis

That seems right, but also reminds me of the point that you need to randomly initialize your neural nets for gradient descent to work (because otherwise the gradients everywhere are the same). Like, in the randomly initialized net, each edge is going to be part of many subcircuits, both good and bad, and the gradient is basically "what's your relative contribution to good subcircuits vs. bad subcircuits?"

Updating the Lottery Ticket Hypothesis

But this is what would be necessary for the "lottery ticket" intuition (i.e. training just picks out some pre-existing useful functionality) to work.

I don't think I agree, because of the many-to-many relationship between neurons and subcircuits.  Or, like, I think the standard of 'reliability' for this is very low. I don't have a great explanation / picture for this intuition, and so probably I should refine the picture to make sure it's real before leaning on it too much?

To be clear, I think I agree with your refinement as a more detailed picture of what's going on; I guess I just think you're overselling how wrong the naive version is?

2johnswentworth1y
Plausible. Here's intuition pump to consider: suppose our net is a complete multigraph: not only is there an edge between every pair of nodes, there's multiple edges with base-2-exponentially-spaced weights, so we can always pick out a subset of them to get any total weight we please between the two nodes. Clearly, masking can turn this into any circuit we please (with the same number of nodes). But it seems wrong to say that this initial circuit has anything useful in it at all.
Updating the Lottery Ticket Hypothesis

Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization.

I think there are papers showing exactly this, like Deconstructing Lottery Tickets and What is the Best Multi-Stage Architecture for Object Recognition?. Another paper, describing the second paper:

We also compare to random, untrained weights because Jarrett et al. (2009) showed — quite strikingly — that the combination of random convolutional filters, rectification, pooling, and

... (read more)

In hindsight, I probably should have explained this more carefully. "Today’s neural networks already contain dog-recognizing subcircuits at initialization" was not an accurate summary of exactly what I think is implausible.

Here's a more careful version of the claim:

  • I do not find it plausible that a random network contains a neuron which acts as a reliable dog-detector. This is the sense in which it's not plausible that networks contain dog-recognizing subcircuits at initialization. But this is what would be necessary for the "lottery ticket" intuition (i.e
... (read more)
Against GDP as a metric for timelines and takeoff speeds

none capable of accelerating world GWP growth.

Or, at least, accelerating world GWP growth faster than they're already doing. (It's not like the various powers with nukes and bioweapons programs are not also trying to make the future richer than the present.)

1Daniel Kokotajlo1y
Yeah, I should clarify, when I'm talking about accelerating world GWP growth I'm talking about bringing annual growth rates to a noticeably higher level than they currently are -- say, to 9%+ per year. [https://www.lesswrong.com/posts/2rQ9vv9HY6i2Z2vQ4/what-technologies-could-cause-world-gdp-doubling-times-to-be]
Load More