This mostly seems to be an argument for: "It'd be nice if no pivotal act is necessary", but I don't think anyone disagrees with that.
It's arguing that, given that your organization has scary (near) AGI capabilities, it is not so much harder (to get a legitimate authority to impose an off-switch on the world's compute) than (to 'manufacture your own authority' to impose that off-switch) such that it's worth avoiding the cost of (developing those capabilities while planning to manufacture authority). Obviously there can be civilizations where that's true, and civilizations where that's not true.
Consider applying to this now anyway; applications often can be pretty quick and there's not all that much value in delaying.
It seems worth remembering the AXRP podcast episode on InfraBayesianism, which I think was the first time I didn't bounce off something related to this?
Several submissions contained perspectives, tricks, or counterexamples that were new to us. We were quite happy to see so many people engaging with ELK, and we were surprised by the number and quality of submissions.
A thing I'm curious about: what's your 'current overall view' on ELK? Is this:
From my perspective, ELK is currently very much "A problem we don't know how to solve, where we think rapid progress is being made (as we're still building out the example-counterexample graph, and are optimistic that we'll find an example without counterexamples)" There's some question of what "rapid" means, but I think we're on track for what we wrote in the ELK doc: "we're optimistic that within a year we will have made significant progress either towards a solution or towards a clear sense of why the problem is hard."
We've spent ~9 months on the proble... (read more)
I am confused what you think I was trying to do with that intuition pump.
I think I'm confused about the intuition pump too! Like, here's some options I thought up:
I'd say "notice that we underestimate the probability that x is even and divisible by 4 by saying it's 12.5%".
Cool, I like this example.
I agree that if you estimate a probability, and then "perform search" / "optimize" / "run n copies of the estimate" (so that you estimate the probability as 1 - (1 - P(event))^n), then you're going to have systematic errors....I suspect this is not the sort of mistake you imagine me doing but I don't think I know what you do imagine me doing.
I think the thing I'm interested in is "what are our estimates of the output of se... (read more)
they download an existing chip schematic, and scale it down
Uh, how big do you think contemporary chips are?
I'm pretty sure you mean functions that perform tasks, like you would put in /utils, but I note that on LW "utility function" often refers to the decision theory concept, and "what decision theoretical utility functions are present in the neural network prior" also seems like an interesting (tho less useful) question.
I'm starting with the intuition pump, noticing I can no longer tell a good story of doom, and concluding "infinite oversight quality --> alignment solved".
I think some of my more alignment-flavored counterexamples look like:
Huh, why doesn't that procedure have that systematic error?
Like, when I try to naively run your steps 1-4 on "probability of there existing a number that's both even and odd", I get that about 25% of numbers should be both even and odd, so it seems pretty likely that it'll work out given that there are at least 4 numbers. But I can't easily construct an argument at a similar level of sophistication that gives me an underestimate. [Like, "probability of there existing a number that's both odd and prime" gives the wrong conclusion if you buy that the probabi... (read more)
Also, to me it seems like a similar thing happens, but with the positions reversed, when Paul and Eliezer try to forecast concrete progress in ML over the next decade. Does that seem right to you?
It feels similar but clearly distinct? Like, in that situation Eliezer often seems to say things that I parse as "I don't have any special knowledge here", which seems like a different thing than "I can't easily sample from my distribution over how things go right", and I also have the sense of Paul being willing to 'go specific' and Eliezer not being willing to '... (read more)
Yeah, sorry about not owning that more, and for the frame being muddled. I don't endorse the "asking Eliezer" or "agreeing with Eliezer" bits, but I do basically think he's right about many object-level problems he identifies (and thus people disagreeing with him about that is not a feature) and think 'security mindset' is the right orientation to have towards AGI alignment. That hypothesis is a 'worry' primarily because asymmetric costs means it's more worth investigating than the raw probability would suggest. [Tho the raw probability of components of it... (read more)
I think my way of thinking about things is often a lot like "draw random samples," more like drawing N random samples rather than particle filtering (I guess since we aren't making observations as we go---if I notice an inconsistency the thing I do is more like backtrack and start over with N fresh samples having updated on the logical fact).
Oh whoa, you don't remember your samples from before? [I guess I might not either, unless I'm concentrating on keeping them around or verbalized them or something; probably I do something more expert-iteration-like whe... (read more)
Man, I would not call the technique you described "mainline prediction". It also seems kinda inconsistent with Vaniver's usage; his writing suggests that a person only has one mainline at a time which seems odd for this technique.Vaniver, is this what you meant?
Man, I would not call the technique you described "mainline prediction". It also seems kinda inconsistent with Vaniver's usage; his writing suggests that a person only has one mainline at a time which seems odd for this technique.
Vaniver, is this what you meant?
Uh, I inherited "mainline" from Eliezer's usage in the dialogue, and am guessing that his reasoning is following a process sort of like mine and John's. My natural word for it is a 'particle', from particle filtering, as linked in various places, which I think is consistent with John's description. ... (read more)
whatever else you might imagine would give you a "mainline".As I understand it, when you "talk about the mainline", you're supposed to have some low-entropy (i.e. confident) view on how the future goes, such that you can answer very different questions X, Y and Z about that particular future, that are all correlated with each other, and all get (say) > 50% probability. (Idk, as I write this down, it seems so obviously a bad way to reason that I feel like I must not be understanding it correctly.)
whatever else you might imagine would give you a "mainline".
As I understand it, when you "talk about the mainline", you're supposed to have some low-entropy (i.e. confident) view on how the future goes, such that you can answer very different questions X, Y and Z about that particular future, that are all correlated with each other, and all get (say) > 50% probability. (Idk, as I write this down, it seems so obviously a bad way to reason that I feel like I must not be understanding it correctly.)
I think this is roughly how I'm thinking about things some... (read more)
I'm just using this as an intuition pump for the listener to establish that a sufficiently powerful oversight process would solve AI alignment.
Huh, I guess I don't believe the intuition pump? Like, as the first counterexample that comes to mind, when I imagine having an AGI where I can tell everything about how it's thinking, and yet I remain a black box to myself, I can't really tell whether or not it's aligned to me. (Is me-now the one that I want it to be aligned to, or me-across-time? Which side of my internal conflicts about A vs. B / which principle ... (read more)
When Alice uses a model with more free parameters, you need to posit a bias before you can predict a systematic direction in which Alice will make mistakes. So this only bites you if you have a bias towards optimism.
That is, when I give Optimistic Alice fewer constraints, she can more easily imagine a solution, and when I give Pessimistic Bob fewer constraints, he can more easily imagine that no solution is possible? I think... this feels true as a matter of human psychology of problem-solving, or something, and not as a matter of math. Like, the way Bob f... (read more)
[I think there's a thing Eliezer does a lot, which I have mixed feelings about, which is matching people's statements to patterns and then responding to the generator of the pattern in Eliezer's head, which only sometimes corresponds to the generator in the other person's head.]
I want to add an additional meta-pattern – there was a once a person who thought I had a particular bias. They'd go around telling me "Ray, you're exhibiting that bias right now. Whatever rationalization you're coming up with right now, it's not the real reason you're arguing X." An... (read more)
I feel like I have a broad distribution over worlds and usually answer questions with probability distributions, that I have a complete mental universe (which feels to me like it outputs answers to a much broader set of questions than Eliezer's, albeit probabilistic ones, rather than bailing with "the future is hard to predict").
Sometimes I'll be tracking a finite number of "concrete hypotheses", where every hypothesis is 'fully fleshed out', and be doing a particle-filtering style updating process, where sometimes hypotheses gain or lose weight, sometimes... (read more)
The main complexity feels like the thing you point out where it's impossible to make them fully fleshed out, so you build a bunch of intuitions about what is consistent (and could be fleshed out given enough time) and ... (read more)
Sorry, I probably should have been more clear about the "this is a quote from a longer dialogue, the missing context is important." I do think that the disagreement about "how relevant is this to 'actual disagreement'?" is basically the live thing, not whether or not you agree with the basic abstract point.
My current sense is that you're right that the thing you're doing is more specific than the general case (and one of the ways you can tell is the line of argumentation you give about chance of doom), and also Eliezer can still be correctly observing that... (read more)
Yeah, I'm also interested in the question of "how do we distinguish 'sentences-on-mainline' from 'shoring-up-edge-cases'?", or which conversational moves most develop shared knowledge, or something similar.
Like I think it's often good to point out edge cases, especially when you're trying to formalize an argument or look for designs that get us out of this trap. In another comment in this thread, I note that there's a thing Eliezer said that I think is very important and accurate, and also think there's an edge case that's not obviously handled corre... (read more)
(For object-level responses, see comments on parallel threads.)
I want to push back on an implicit framing in lines like:
there's some value to more people thinking thru / shooting down their own edge cases [...], instead of pushing the work to Eliezer.people aren't updating on the meta-level point and continue to attempt 'rolling their own crypto', asking if Eliezer can poke the hole in this new procedure
there's some value to more people thinking thru / shooting down their own edge cases [...], instead of pushing the work to Eliezer.
people aren't updating on the meta-level point and continue to attempt 'rolling their own crypto', asking if Eliezer can poke the hole in this new procedure
This makes it sound like the rest of us don't try to break our proposals, push the work to Eliezer, agree with Eliezer when he finds a problem, and then no... (read more)
But also my sense is that there's some deep benefit from "having mainlines" and conversations that are mostly 'sentences-on-mainline'?
I agree with this. Or, if you feel ~evenly split between two options, have two mainlines and focus a bunch on those (including picking at cruxes and revising your mainline view over time).
Like, it feels to me like Eliezer was generating sentences on his mainline, and Richard was responding with 'since you're being overly pessimistic, I will be overly optimistic to balance', with no attempt to have his response match his
The most recent post has a related exchange between Eliezer and Rohin:
Eliezer: I think the critical insight - though it has a format that basically nobody except me ever visibly invokes in those terms, and I worry maybe it can only be taught by a kind of life experience that's very hard to obtain - is the realization that any consistent reasonable story about underlying mechanisms will give you less optimistic forecasts than the ones you get by freely combining surface desiderataRohin: Yeah, I think I do not in fact understand why that is true for any cons
Eliezer: I think the critical insight - though it has a format that basically nobody except me ever visibly invokes in those terms, and I worry maybe it can only be taught by a kind of life experience that's very hard to obtain - is the realization that any consistent reasonable story about underlying mechanisms will give you less optimistic forecasts than the ones you get by freely combining surface desiderata
Rohin: Yeah, I think I do not in fact understand why that is true for any cons
This is mostly in response to stuff written by Richard, but I'm interested in everyone's read of the situation.
While I don't find Eliezer's core intuitions about intelligence too implausible, they don't seem compelling enough to do as much work as Eliezer argues they do. As in the Foom debate, I think that our object-level discussions were constrained by our different underlying attitudes towards high-level abstractions, which are hard to pin down (let alone resolve).Given this, I think that the most productive mode of intellectual engagement with Eliezer'
While I don't find Eliezer's core intuitions about intelligence too implausible, they don't seem compelling enough to do as much work as Eliezer argues they do. As in the Foom debate, I think that our object-level discussions were constrained by our different underlying attitudes towards high-level abstractions, which are hard to pin down (let alone resolve).
Given this, I think that the most productive mode of intellectual engagement with Eliezer'
EDIT: I wrote this before seeing Paul's response; hence a significant amount of repetition.
They often seem to emit sentences that are 'not absurd', instead of 'on their mainline', because they're mostly trying to generate sentences that pass some shallow checks instead of 'coming from their complete mental universe.'Why is this?
They often seem to emit sentences that are 'not absurd', instead of 'on their mainline', because they're mostly trying to generate sentences that pass some shallow checks instead of 'coming from their complete mental universe.'
Why is this?
Well, there are many boring cases that are explained by pedagogy / argument structure. When I say things like "in the limit of infinite oversight capacity, we could just understand everything about the AI system and reengineer it to... (read more)
I feel like I have a broad distribution over worlds and usually answer questions with probability distributions, that I have a complete mental universe (which feels to me like it outputs answers to a much broader set of questions than Eliezer's, albeit probabilistic ones, rather than bailing with "the future is hard to predict"). At a high level I don't think "mainline" is a great concept for describing probability distributions over the future except in certain exceptional cases (though I may not understand what "mainline" means), and that neat stor... (read more)
I'm guessing that a proponent of Christiano's theory would say: sure, such-and-such startup succeeded but it was because they were the only ones working on problem P, so problem P was an uncrowded field at the time. Okay, but why do we draw the boundary around P rather than around "software" or around something in between which was crowded?
I'd make a different reply: you need to not just look at the winning startup, but all startups. If it's the case that the 'startup ecosystem' is earning 100% returns and the rest of the economy is earning 5% returns, the... (read more)
That said, since I can't resist responding to random comments: are horses really being bred for sprinting as fast as they can for 20-30 seconds? (Isn't that what cheetahs are so good at?) What is the military/agricultural/trade context in which that is relevant? Who cares other than horse racers? Over any of the distances where people are using horses I would expect them to be considerably faster than cheetahs even if both are unburdened. I don't know much about horses though.
My understanding is that the primary military use of horses in Europe for elites ... (read more)
Humans invested exorbitant amounts of money and effort into making better cheetahs, in the sense of 'trying to be able to run much faster and become the fastest creatures on earth'; we call those manufactured cheetahs, "horses".
I don't think Paul is talking about that. Consider the previous lines (which seem like they could describe animal breeding to me):
and you think that G doesn't help you improve on muscles and tendons?until you have a big pile of it?
and you think that G doesn't help you improve on muscles and tendons?
until you have a big pile of it?
and Eliezer's response in the following lines:
the natural selection of cheetahs is investing in itit's
the natural selection of cheetahs is investing in it
I agree with your framing, and I think it shows Paul is wrong, leaving aside the specifics of the cheetah thing. Looking back, humans pursued both paths, the path of selecting cheetahs (horses) and of using G to look for completely different paradigms that blow away cheetahs. (Since we aren't evolution, we aren't restricted to picking just one approach.) And we can see the results today: when was the last time you rode a horse?
If you had invested in 'the horse economy' a century ago and bought the stock of bluechip buggywhip manufacturers instead of aerosp... (read more)
Like, fundamentally the question is something like "how efficient and accurate is the AI research market?"
I would distinguish two factors:
You could turn the "powerful and well-directed" dial up to the maximum allowed by physics, and still not thereby guarantee that information asymmetries are rare, because the way that a society applies maximum optimization pressure to reaching AGI ASAP might route through a lot of indiv... (read more)
When you’re considering between a project that gives us a boost in worlds where P(doom) was 50% and projects that help out in worlds where P(doom) was 1% or 99%, you should probably pick the first project, because the derivative of P(doom) with respect to alignment progress is maximized at 50%.Many prominent alignment researchers estimate P(doom) as substantially less than 50%. Those people often focus on scenarios which are surprisingly bad from their perspective basically for this reason.And conversely, people who think P(doom) > 50% should aim their
When you’re considering between a project that gives us a boost in worlds where P(doom) was 50% and projects that help out in worlds where P(doom) was 1% or 99%, you should probably pick the first project, because the derivative of P(doom) with respect to alignment progress is maximized at 50%.
Many prominent alignment researchers estimate P(doom) as substantially less than 50%. Those people often focus on scenarios which are surprisingly bad from their perspective basically for this reason.
And conversely, people who think P(doom) > 50% should aim their
Why do they separate out the auditory world and the environment?
So it looks like the R-7 (which launched Sputnik) was the first ICBM, and the range is way longer than the V-2s of ~15 years earlier, but I'm not easily finding a graph of range over those intervening years. (And the R-7 range is only about double the range of a WW2-era bomber, which further smooths the overall graph.)
[And, implicitly, the reason we care about ICBMs is because the US and the USSR were on different continents; if the distance between their major centers was comparable to England and France's distance instead, then the same strategic considerations would have been hit much sooner.]
presumably we saw a discontinuous jump in flight range when Sputnik entered orbit.
While I think orbit is the right sort of discontinuity for this, I think you need to specify 'flight range' in a way that clearly favors orbits for this to be correct, mostly because about a month before was the manhole cover launched/vaporized with a nuke.
[But in terms of something like "altitude achieved", I think Sputnik is probably part of a continuous graph, and probably not the most extreme member of the graph?]
My understanding is that Sputnik was a big discontinuous jump in "distance which a payload (i.e. nuclear bomb) can be delivered" (or at least it was a conclusive proof-of-concept of a discontinuous jump in that metric). That metric was presumably under heavy optimization pressure at the time, and was the main reason for strategic interest in Sputnik, so it lines up very well with the preconditions for the continuous view.
your point is simply that it's hard to predict when that will happen when you just look at the Penn Treebank trend.
This is a big part of my point; a smaller elaboration is that it can be easy to trick yourself into thinking that, because you understand what will happen with PTB, you'll understand what will happen with economics/security/etc., when in fact you don't have much understanding of the connection between those, and there might be significant discontinuities. [To be clear, I don't have much understanding of this either; I wish I did!]
For example, ... (read more)
it seems like extrapolating from the past still gives you a lot better of a model than most available alternatives.
My impression is that some people are impressed by GPT-3's capabilities, whereas your response is "ok, but it's part of the straight-line trend on Penn Treebank; maybe it's a little ahead of schedule, but nothing to write home about." But clearly you and they are focused on different metrics!
That is, suppose it's the case that GPT-3 is the first successfully commercialized language model. (I think in order to make this literally true you... (read more)
The mental move I'm doing for each of these examples is not imagining universes where addition/evolution/other deep theory is wrong, but imagining phenomena/problems where addition/evolution/other deep theory is not adapted. If you're describing something that doesn't commute, addition might be a deep theory, but it's not useful for what you want.
Yeah, this seems reasonable to me. I think "how could you tell that theory is relevant to this domain?" seems like a reasonable question in a way that "what predictions does that theory make?" seems like it's somehow coming at things from the wrong angle.
And even if I feel what you're gesturing at, this sounds/looks like you're saying "even if my prediction is false, that doesn't mean that my theory would be invalidated".
So, thermodynamics also feels like a deep fundamental theory to me, and one of the predictions it makes is "you can't make an engine more efficient than a Carnot engine." Suppose someone exhibits an engine that appears to be more efficient than a Carnot engine; my response is not going to be "oh, thermodynamics is wrong", and instead it's going to be "oh, this engine is making use of... (read more)
It's taking a massive massive failure and trying to find exactly the right abstract gloss to put on it that makes it sound like exactly the right perfect thing will be done next time.
I feel like Ngo didn't really respond to this?
Like, later he says:
Right, I'm not endorsing this as my mainline prediction about what happens. Mainly what I'm doing here is highlighting that your view seems like one which cherrypicks pessimistic interpretations.
But... Richard, are you endorsing it as 'at all in line with the evidence?' Like, when I imagine living in that ... (read more)
So we would need to figure out how to robustly get an honest signal from such an experiment, which still seems quite hard. But perhaps it's easier than solving the full alignment problem before the first shot.
IMO this is a 'additional line of defense' boxing strategy instead of simplification.
Note that in the traditional version, the 'dud' bit of the bomb can only be the trigger; a bomb that absorbs the photon but then explodes isn't distinguishable from a bomb that absorbs the photon and then doesn't explode (because of an error deeper in the bomb).... (read more)
However, I think it's not at all obvious to me that corrigibility doesn't have a "small central core". It does seem to me like the "you are incomplete, you will never be complete" angle captures a lot of what we mean by corrigibility.
I think all three of Eliezer, you, and I share the sense that corrigibility is perhaps philosophically simple. The problem is that for it to actually have a small central core / be a natural stance, you need the 'import philosophy' bit to also have a small central core / be natural, and I think those bits aren't true.
Lik... (read more)
Oh, I was imagining something like "well, our current metals aren't strong enough, what if we developed stronger ones?", and then focusing on metallurgy. And this is making forward progress--you can build a taller tower out of steel than out of iron--but it's missing more fundamental issues like "you're not going to be able to drive on a bridge that's perpendicular to gravity, and the direction of gravity will change over the course of the trip" or "the moon moves relative to the earth, such that your bridge won't be able to be one object", which will sink... (read more)
Certainly, if you're working on a substantial breakthrough in AI capability, there are reasons to keep it secret. But why would you work on that in the first place?
Most of the mentions of secrecy in this post are in that context. I think a lot of people who say they care about the alignment problem think that the 'two progress bars' model, where you can work on alignment and capability independent of each other, is not correct, and so they don't see all that much of a difference between capability work and alignment work. (If you're trying to predict human... (read more)
I'm annoyed by EY (and maybe MIRI's?) dismissal of every other alignment work, and how seriously it seems to be taken here, given their track record of choosing research agendas with very indirect impact on alignment
For what it's worth, my sense is that EY's track record is best in 1) identifying problems and 2) understanding the structure of the alignment problem.
And, like, I think it is possible that you end up in situations where the people who understand the situation best end up the most pessimistic about it. If you're trying to build a bridge to the ... (read more)
[Note: I use Copilot and like it. The 'aha' moment for me was when I needed to calculate the intersection of two lines, a thing that I would normally just copy/paste from Stack Overflow, and instead Copilot wrote the function for me. Of course I then wrote tests and it passed the tests, which seemed like an altogether better workflow.]
Language models are good enough at generating code to make the very engineers building such models slightly more productive
How much of this is 'quality of code' vs. 'quality of data'? I would naively expect that the sort of a... (read more)
Thanks for sharing negative results!
If I'm understanding you correctly, the structure looks something like this:
I guess my sense is that most biological systems are going to be 'package deals' instead of 'cleanly separable' as much as possible--if you already have a system that's doing learning, and you can tweak that system in order to get something that gets you some of the benefits of a VoI framework (without actually calculating VoI), I expect biology to do that.
But in experiments, they’re not synchronized; the former happens faster than the latter.
This has the effect of incentivizing learning, right? (A system that you don't yet understand is, in total, more rewarding than an equally yummy system that you do understand.) So it reminds me of exploration in bandit algorithms, which makes sense given the connection to motivation.
Is "movies" a standin for "easily duplicated cultural products", or do you think movies in particular are underproduced?
Ah, I now suspect that I misunderstood you as well earlier: you wanted your list to be an example of "what you mean by DNN-style calculations" but I maybe interpreted as "a list of things that are hard to do with DNNs". And under that reading, it seemed unfair because the difficulty that even high-quality DNNs have in doing simple arithmetic is mirrored by the difficulty that humans have in doing simple arithmetic.
Similarly, I agree with you that there are lots of things that seem very inefficient to implement via DNNs rather than directly (like MCTS, or s... (read more)
Do you think DNNs and human brains are doing essentially the same type of information processing? If not, how did you conclude "humans can't do those either"? Thanks!
Sorry for the late reply, but I was talking from personal experience. Multiplying matrices is hard! Even for extremely tiny ones, I was sped up tremendously by pencil and paper. It was much harder than driving a car, or recognizing whether a image depicts a dog or not. Given the underlying computational complexity of the various tasks, I can only conclude that I'm paying an exorbitant performa... (read more)
That seems right, but also reminds me of the point that you need to randomly initialize your neural nets for gradient descent to work (because otherwise the gradients everywhere are the same). Like, in the randomly initialized net, each edge is going to be part of many subcircuits, both good and bad, and the gradient is basically "what's your relative contribution to good subcircuits vs. bad subcircuits?"
But this is what would be necessary for the "lottery ticket" intuition (i.e. training just picks out some pre-existing useful functionality) to work.
I don't think I agree, because of the many-to-many relationship between neurons and subcircuits. Or, like, I think the standard of 'reliability' for this is very low. I don't have a great explanation / picture for this intuition, and so probably I should refine the picture to make sure it's real before leaning on it too much?
To be clear, I think I agree with your refinement as a more detailed picture of what's going on; I guess I just think you're overselling how wrong the naive version is?
Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization.
I think there are papers showing exactly this, like Deconstructing Lottery Tickets and What is the Best Multi-Stage Architecture for Object Recognition?. Another paper, describing the second paper:
We also compare to random, untrained weights because Jarrett et al. (2009) showed — quite strikingly — that the combination of random convolutional filters, rectification, pooling, and
In hindsight, I probably should have explained this more carefully. "Today’s neural networks already contain dog-recognizing subcircuits at initialization" was not an accurate summary of exactly what I think is implausible.
Here's a more careful version of the claim:
none capable of accelerating world GWP growth.
Or, at least, accelerating world GWP growth faster than they're already doing. (It's not like the various powers with nukes and bioweapons programs are not also trying to make the future richer than the present.)