- I think that corrigibility is more likely to be a crisp property amongst systems that perform well-as-evaluated-by-you. I think corrigibility is only likely to be useful in cases like this where it is crisp and natural.
Can someone explain to me what this crispness is?
As I'm reading Paul's comment, there's an amount of optimization for human reward that breaks our rating ability. This is a general problem for AI because of the fundamental reason that as we increase an AI's optimization power, it gets better at the task, but it also gets better at breaking m... (read more)
If you have a space with two disconnected components, then I'm calling the distinction between them "crisp." For example, it doesn't depend on exactly how you draw the line.
It feels to me like this kind of non-convexity is fundamentally what crispness is about (the cluster structure of thingspace is a central example). So if you want to draw a crisp line, you should be looking for this kind of disconnectedness/non-convexity.
ETA: a very concrete consequence of this kind of crispness, that I should have spelled out in the OP, is that there are many functions... (read more)
Minor clarification: This doesn't refer to re-writing the LW corrigibility tag. I believe a tag is a reply in glowfic, where each author responds with the next tag i.e. next bit of the story, with an implied "tag – now you're it!" at the other author.
Solid contribution, thank you.
Agreed explicitly for the record.
Just as a related idea, in my mind, I often do a kind of thinking that HPMOR!Harry would call “Hufflepuff Bones”, where I look for ways a problem is solvable in physical reality at all, before considering ethical and coordination and even much in the way of practical concerns.
Thanks, this story is pretty helpful (to my understanding).
I think it would be less "off-putting" if we had common knowledge of it being such a post. I think the authors don't think of it as that from reading Sidney's comment.
In Part 2, I analyzed a common argument in favor of that kind of “pivotal act”, and found a pretty simple flaw stemming from fallaciously assuming that the AGI company has to do everything itself (rather than enlisting help from neutral outsiders, using evidence).
For the record this does seem like the cruxy part of the whole discussion, and I think more concrete descriptions of alternatives would help assuage my concerns here.
In fact, before you get to AGI, your company will probably develop other surprising capabilities, and you can demonstrate those capabilities to neutral-but-influential outsiders who previously did not believe those capabilities were possible or concerning. In other words, outsiders can start to help you implement helpful regulatory ideas, rather than you planning to do it all on your own by force at the last minute using a super-powerful AI system.
This all seems like it would be good news. For the record I think that the necessary evidence to start act... (read more)
Sweet! I could also perform a replication I guess.
Does anyone in-thread (or reading along) have any experiments they'd be interested in me running with this air conditioner? It doesn't seem at all hard for me to do some science and get empirical data, with a different setup to Wirecutter, so let me know.
Added: From a skim of the thread, it seems to me the experiment that would resolve matters is testing in a large room with temperature sensors more like 15 feet away in a city or country that's very hot outside, and to compare this with (say) Wirecutter's top pick with two-hoses. Confirm?
... I actually already started a post titled "Preregistration: Air Conditioner Test (for AI Alignment!)". My plan was to use the one-hose AC I bought a few years ago during that heat wave, rig up a cardboard "second hose" for it, and try it out in my apartment both with and without the second hose next time we have a decently-hot day. Maybe we can have an air conditioner test party.
Predictions: the claim which I most do not believe right now is that going from one hose to two hose with the same air conditioner makes only a 20%-30% difference. The main metr... (read more)
Curated. Laying out a full story for why the work you're doing is solving AI alignment is very helpful, and this framing captures different things from other framings (e.g. Rocket Alignment, Embedded Curiosities, etc). Also it's simply written and mercifully short, relative to other such things. Thanks for this step in the conversation.
Ah, very good point. How interesting…
(If I’d concretely thought of transferring knowledge between a bird and a dog this would have been obvious.)
Solomonoff's theory of induction, along with the AIXI theory of intelligence, operationalize knowledge as the ability to predict observations.
Maybe this is what knowledge is. But I’d like to try coming up with at least one alternative. So here goes!
I want to define knowledge as part of an agent.
Eliezer, when you told Richard that your probability of a successful miracle is very low, you added the following note:
Though a lot of that is dominated, not by the probability of a positive miracle, but by the extent to which we seem unprepared to take advantage of it, and so would not be saved by one.
I don't mean to ask for positive fairy tales when I ask: could you list some things you could see in the world that would cause you to feel that we were well-prepared to take advantage of one if we got one?
My obvious quick guess would be "I know of an ML pro... (read more)
Eliezer and Nate, my guess is that most of your perspective on the alignment problem for the past several years has come from the thinking and explorations you've personally done, rather than reading work done by others.
But, if you have read interesting work by others that's changed your mind or given you helpful insights, what has it been? Some old CS textbook? Random Gwern articles? An economics textbook? Playing around yourself with ML systems?
Questions about the standard-university-textbook from the future that tells us how to build an AGI. I'll take answers on any of these!
I'm going to try and write a table of contents for the textbook, just because it seems like a fun exercise.
Epistemic status: unbridled speculation
Volume I: Foundation
Part I: Statistical Learning Theory
I don't think there is an "AGI textbook" any more than there is an "industrialization textbook." There are lots of books about general principles and useful kinds of machines. That said, if I had to make wild guesses about roughly what that future understanding would look like:
I am very excited by the way the post takes a relatively simple problem and shows, in trying to solve it, a great deal of the depth of the alignment problem.
FWIW I wouldn’t write this line today, I am now much more confused about what ELK says or means.
(News: OpenAI has built a theorem-prover that solved many AMC12 and AIME competition problems, and 2 IMO problems, and they say they hope this leads to work that wins the IMO Grand Challenge.)
My quick two-line review is something like: this post (and its sequel) is an artifact from someone with an interesting perspective on the world looking at the whole problem and trying to communicate their practical perspective. I don't really share this perspective, but it is looking at enough of the real things, and differently enough to the other perspectives I hear, that I am personally glad to have engaged with it. +4.
"Search versus design" explores the basic way we build and trust systems in the world. A few notes:
(I did not write a curation notice in time, but that doesn’t mean I don’t get to share why I wanted to curate this post! So I will do that here.)
Typically when I read a post by Paul, it feels like a single ingredient in a recipe, but one where I don’t know what meal the recipe is for. This report felt like one of the first times I was served a full meal, and I got to see how all the prior ingredients come together.
Alternative framing: Normally Paul’s posts feel like the argument step “J -> K” and I’m left wondering how we got to J, and where we’ll go fr... (read more)
Radical Probabilism is an extensions of the Embedded Agency philosophical position. I remember reading is and feeling a strong sense that I really got to see a well pinned-down argument using that philosophy. Radical Probabilism might be a +9, will have to re-read, but for now I give it +4.
(This review is taken from my post Ben Pace's Controversial Picks for the 2020 Review.)
Introduction to Cartesian Frames is a piece that also gave me a new philosophical perspective on my life.
I don't know how to simply describe it. I don't know what even to say here.
One thing I can say is that the post formalized the idea of having "more agency" or "less agency", in terms of "what facts about the world can I force to be true?". The more I approach the world by stating things that are going to happen, that I can't change, the more I'm boxing-in my agency over the world. The more I treat constraints as things I could fight to chang... (read more)
An Orthodox Case Against Utility Functions was a shocking piece to me. Abram spends the first half of the post laying out a view he suspects people hold, but he thinks is clearly wrong, which is a perspective that approaches things "from the starting-point of the universe". I felt dread reading it, because it was a view I held at the time, and I used as a key background perspective when I discussed bayesian reasoning. The rest of the post lays out an alternative perspective that "starts from the standpoint of the agent". Instead of my beliefs being about t... (read more)
This is an interesting tack, this step and the next ("Strategy: have humans adopt the optimal Bayes net") feels new to me.
From the section "Strategy: have humans adopt the optimal Bayes net":
Roughly speaking, imitative generalization:
- Considers the space of changes the humans could make to their Bayes net;
- Learns a function which maps (proposed change to Bayes net) to (how a human — with AI assistants — would make predictions after making that change);
- Searches over this space to find the change that allows the humans to make the best predictions.
Regarding the second step, what is the meat of this function? My superficial understanding is that a Bayes net is deterministic and fu... (read more)
Question: what's the relative amount of compute you are imagining SmartVault and the helper AI having? Both the same, or one having a lot more?
I'm reading along, and I don't follow the section "Strategy: have AI help humans improve our understanding". The problem so far is that the AI only need identify bad outcomes that the human labelers can identify, rather than bad outcomes regardless of human-labeler identification.
The solution posed here is to have AIs help the human labeler understand more bad (and good) outcomes, using powerful AI. The section mostly provides justification for making the assumption that we can align these helper AIs (reason: the authors believe there is a counterexa... (read more)
For reference, here is a 2004 post by Moravec, that’s helpfully short, containing his account of his own predictions: https://www.frc.ri.cmu.edu/~hpm/project.archive/robot.papers/2004/Predictions.html
Hmm, alas, stopped reading too soon.
Is Humbali right that generic uncertainty about maybe being wrong, without other extra premises, should increase the entropy of one's probability distribution over AGI, thereby moving out its median further away in time?
I'll add a quick answer: my gut says technically true, but that mostly I should just look at the arguments because they provide more weight than the prior. Strong evidence is common. It seems plausible to me that the prior over 'number of years away' should make me predict it's more like 10 trillion years
For indeed in a case like this, one first backs up and asks oneself "Is Humbali right or not?" and not "How can I prove Humbali wrong?"
Gonna write up some of my thoughts here without reading on, and post them (also without reading on).
I don’t get why Humbali’s objection has not already been ‘priced in’. Eliezer has a bunch of models and info and his gut puts the timeline at before 2050. I don’t think “what if you’re mistaken about everything” isn’t an argument Eliezer already considered, so I think it’s already priced into the prediction. You’re not allowe
Wow thanks for pulling that up. I've gotta say, having records of people's predictions is pretty sweet. Similarly, solid find on the Bostrom quote.
Do you think that might be the 20% number that Eliezer is remembering? Eliezer, interested in whether you have a recollection of this or not. [Added: It seems from a comment upthread that EY was talking about superforecasters in Feb 2016, which is after Fan Hui.]
Adding my recollection of that period: some people made the relevant updates when DeepMind's system beat the European Champion Fan Hui (in October 2015). My hazy recollection is that beating Fan Hui started some people going "Oh huh, I think this is going to happen" and then when AlphaGo beat Lee Sedol (in March 2016) everyone said "Now it is happening".
It seems from this Metaculus question that people indeed were surprised by the announcement of the match between Fan Hui and AlphaGo (which was disclosed in January, despite the match happening months earlier, according to Wikipedia).
It seems hard to interpret this as AlphaGo being inherently surprising though, because the relevant fact is that the question was referring only to 2016. It seems somewhat reasonable to think that even if a breakthrough is on the horizon, it won't happen imminently with high probability.
Perhaps a better source of evidence of A... (read more)
How interesting; I am the median.
Thank you for this follow-up comment Adam, I appreciate it.
Glad to hear. And yeah, that’s the crux of the issue for me.
Follow-up
One of Eliezer's claims here is
It is very, very clear that at present rates of progress, adding that level of alignment capability as grown over the next N years, to the AGI capability that arrives after N years, results in everybody dying very quickly.
This is a claim I basically agree with.
I don't think the situation is entirely hopeless, but I don't think any of the current plans (or the current alignment field) are on track to save us.
Thank you for the links Adam. To clarify, the kind of argument I'm really looking for is something like the following three (hypothetical) examples.
Thanks for the examples, that helps a lot.
I'm glad that I posted my inflammatory comment, if only because exchanging with you and Rob made me actually consider the question of "what is our story to success", instead of just "are we making progress/creating valuable knowledge". And the way you two have been casting it is way less aversive to me that the way EY tends to frame it. This is definitely something I want to think more about. :)
... (read more)I want to leave this paragraph as social acknowledgment that you mentioned upthread that you're tired and taking a break,
Adam, can you make a positive case here for how the work being done on prosaic alignment leads to success? You didn't make one, and without it I don't understand where you're coming from. I'm not asking you to tell me a story that you have 100% probability on, just what is the success story you're acting under, such that EY's stances seem to you to be mostly distracting people from the real work.
(Later added disclaimer: it's a good idea to add "I feel like..." before the judgment in this comment, so that you keep in mind that I'm talking about my impressions and frustrations, rarely stating obvious facts (despite the language making it look so))
Thanks for trying to understand my point and asking me for more details. I appreciate it.
Yet I feel weird when trying to answer, because my gut reaction to your comment is that you're asking the wrong question? Also, the compression of my view to "EY's stances seem to you to be mostly distracting people fro... (read more)
If superintelligence is approximately multimodal GPT-17 plus reinforcement learning, then understanding how GPT-3-scale algorithms function is exceptionally important to understanding super-intelligence.
Also, if superintelligence doesn’t happen then prosaic alignment is the only kind of alignment.
Aaaaaaaaaaaaahhhhhhhhhhhhhhhhh!!!!!!!!!!!!
(...I'll be at the office, thinking about how to make enough progress fast enough.)
Follow-up
One of Eliezer's claims here is
It is very, very clear that at present rates of progress, adding that level of alignment capability as grown over the next N years, to the AGI capability that arrives after N years, results in everybody dying very quickly.
This is a claim I basically agree with.
I don't think the situation is entirely hopeless, but I don't think any of the current plans (or the current alignment field) are on track to save us.
Would you prefer questions here or on the EA Forum?
Fixed the LaTex.
There’s a related dynamic that came up in a convo I just had.
... (read more)Alice: My current work is exploring if we can solve value loading using reward learning.
Bob: Woah, isn’t that obviously doomed? Didn’t Rohin write a whole sequence on this?
Alice: Well, I don’t want to solve the whole problem for arbitrary difficulty. I just want to know whether we can build something that gets the basics right in distributions that a present day human can understand. For example I reckon we may be able to teach an AI what murder is today, even if we can’t teach it what murder is
I was in the chat and don't have anything especially to "disclose". Joe and Nick are both academic philosophers who've studied at Oxford and been at FHI, with a wide range of interests. And Abram and Scott are naturally great people to chat about decision theory with when they're available.
Not modeling vs modeling. Thx.
What’s the second half of the versus in this section? It’s probably straightforward but I’d appreciate someone spelling it out.
Scott: And I'm basically distinguishing between a system that's learning how to do reasoning while being overseen and kept out of the convex hull of human modeling versus… And there are definitely trade-offs here, because you have more of a daemon problem or something if you're like, "I'm going to learn how to do reasoning," as opposed to, "I'm going to be told how to do reasoning from the humans." And so then you have to search over this richer space or something of how to do reasoning, which makes it harder.
This was a great summary, thx.
What is this, "A Series of Unfortunate Logical Events"? I laughed quite a bit, and enjoyed walking through the issues in self-knowledge that the löbstacle poses.
Curated, in part for this episode, and also as a celebration of the whole series. I've listened to 6 out of the 9, and I've learned a great deal about people's work and their motivations for it. This episode in particular was excellent because I finally learned what a finite factored set was – your example of the Cartesian plane was really helpful! Which is a credit to your communication skills.
Basically every episode has been worthwhile and valuable for me, it's been easy to sit down with a researcher and hear them explain their research, and Daniel alway... (read more)
I'm glad to hear that the podcast is useful for people :)
Thanks!