Eliezer Yudkowsky - AI Alignment Forum

What the main post is responding to is the argument: "We're just training AIs to imitate human text, right, so that process can't make them get any smarter than the text they're imitating, right? So AIs shouldn't learn abilities that humans don't have; because why would you need those abilities to learn to imitate humans?" And to this the main post says, "Nope."

The main post is not arguing: "If you abstract away the tasks humans evolved to solve, from human levels of performance at those tasks, the tasks AIs are being trained to solve are harder than those tasks in principle even if they were being solved perfectly." I agree this is just false, and did not think my post said otherwise.

Updatelessness doesn't solve most problems

Eliezer Yudkowsky2mo917

This deserves a longer answer than I have time to allocate it, but I quickly remark that I don't recognize the philosophy or paradigm of updatelessness as refusing to learn things or being terrified of information; a rational agent should never end up in that circumstance, unless some perverse other agent is specifically punishing them for having learned the information (and will lose of their own value thereby; it shouldn't be possible for them to gain value by behaving "perversely" in that way, for then of course it's not "perverse"). Updatelessness is, indeed, exactly that sort of thinking which prevents you from being harmed by information, because your updateless exposure to information doesn't cause you to lose coordination with your counterfactual other selves or exhibit dynamic inconsistency with your past self.

From an updateless standpoint, "learning" is just the process of reacting to new information the way your past self would want you to do in that branch of possibility-space; you should never need to remain ignorant of anything. Maybe that involves not doing the thing that would then be optimal when considering only the branch of reality you turned out to be inside, but the updateless mind denies that this was ever the principle of rational choice, and so feels no need to stay ignorant in order to maintain dynamic consistency.

Critical review of Christiano's disagreements with Yudkowsky

Eliezer Yudkowsky4mo612

They can solve it however they like, once they're past the point of expecting things to work that sometimes don't work. I have guesses but any group that still needs my hints should wait and augment harder.

Critical review of Christiano's disagreements with Yudkowsky

Eliezer Yudkowsky4mo2822

I disagree with my characterization as thinking problems can be solved on paper, and with the name "Poet". I think the problems can't be solved by twiddling systems weak enough to be passively safe, and hoping their behavior generalizes up to dangerous levels. I don't think paper solutions will work either, and humanity needs to back off and augment intelligence before proceeding. I do not take the position that we need a global shutdown of this research field because I think that guessing stuff without trying it is easy, but because guessing it even with some safe weak lesser tries is still impossibly hard. My message to humanity is "back off and augment" not "back off and solve it with a clever theory".

Evaluating the historical value misspecification argument

Eliezer Yudkowsky7mo4828

I have never since 1996 thought that it would be hard to get superintelligences to accurately model reality with respect to problems as simple as "predict what a human will thumbs-up or thumbs-down". The theoretical distinction between producing epistemic rationality (theoretically straightforward) and shaping preference (theoretically hard) is present in my mind at every moment that I am talking about these issues; it is to me a central divide of my ontology.

If you think you've demonstrated by clever textual close reading that Eliezer-2018 or Eliezer-2008 thought that it would be hard to get a superintelligence to understand humans, you have arrived at a contradiction and need to back up and start over.

The argument we are trying to explain has an additional step that you're missing. You think that we are pointing to the hidden complexity of wishes in order to establish in one step that it would therefore be hard to get an AI to output a correct wish shape, because the wishes are complex, so it would be difficult to get an AI to predict them. This is not what we are trying to say. We are trying to say that because wishes have a lot of hidden complexity, the thing you are trying to get into the AI's preferences has a lot of hidden complexity. This makes the nonstraightforward and shaky problem of getting a thing into the AI's preferences, be harder and more dangerous than if we were just trying to get a single information-theoretic bit in there. Getting a shape into the AI's preferences is different from getting it into the AI's predictive model. MIRI is always in every instance talking about the first thing and not the second.

You obviously need to get a thing into the AI at all, in order to get it into the preferences, but getting it into the AI's predictive model is not sufficient. It helps, but only in the same sense that having low-friction smooth ball-bearings would help in building a perpetual motion machine; the low-friction ball-bearings are not the main problem, they are a kind of thing it is much easier to make progress on compared to the main problem. Even if, in fact, the ball-bearings would legitimately be part of the mechanism if you could build one! Making lots of progress on smoother, lower-friction ball-bearings is even so not the sort of thing that should cause you to become much more hopeful about the perpetual motion machine. It is on the wrong side of a theoretical divide between what is straightforward and what is not.

You will probably protest that we phrased our argument badly relative to the sort of thing that you could only possibly be expected to hear, from your perspective. If so this is not surprising, because explaining things is very hard. Especially when everyone in the audience comes in with a different set of preconceptions and a different internal language about this nonstandardized topic. But mostly, explaining this thing is hard and I tried taking lots of different angles on trying to get the idea across.

In modern times, and earlier, it is of course very hard for ML folk to get their AI to make completely accurate predictions about human behavior. They have to work very hard and put a lot of sweat into getting more accurate predictions out! When we try to say that this is on the shallow end of a shallow-deep theoretical divide (corresponding to Hume's Razor) it often sounds to them like their hard work is being devalued and we could not possibly understand how hard it is to get an AI to make good predictions.

Now that GPT-4 is making surprisingly good predictions, they feel they have learned something very surprising and shocking! They cannot possibly hear our words when we say that this is still on the shallow end of a shallow-deep theoretical divide! They think we are refusing to come to grips with this surprising shocking thing and that it surely ought to overturn all of our old theories; which were, yes, phrased and taught in a time before GPT-4 was around, and therefore do not in fact carefully emphasize at every point of teaching how in principle a superintelligence would of course have no trouble predicting human text outputs. We did not expect GPT-4 to happen, in fact, intermediate trajectories are harder to predict than endpoints, so we did not carefully phrase all our explanations in a way that would make them hard to misinterpret after GPT-4 came around.

But if you had asked us back then if a superintelligence would automatically be very good at predicting human text outputs, I guarantee we would have said yes. You could then have asked us in a shocked tone how this could possibly square up with the notion of "the hidden complexity of wishes" and we could have explained that part in advance. Alas, nobody actually predicted GPT-4 so we do not have that advance disclaimer down in that format. But it is not a case where we are just failing to process the collision between two parts of our belief system; it actually remains quite straightforward theoretically. I wish that all of these past conversations were archived to a common place, so that I could search and show you many pieces of text which would talk about this critical divide between prediction and preference (as I would now term it) and how I did in fact expect superintelligences to be able to predict things!

The Commitment Races problem

Eliezer Yudkowsky9mo60

TBC, I definitely agree that there's some basic structural issue here which I don't know how to resolve. I was trying to describe properties I thought the solution needed to have, which ruled out some structural proposals I saw as naive; not saying that I had a good first-principles way to arrive at that solution.

Making Nanobots isn't a one-shot process, even for an artificial superintelligance

Eliezer Yudkowsky11mo21

At the superintelligent level there's not a binary difference between those two clusters. You just compute each thing you need to know efficiently.

Making Nanobots isn't a one-shot process, even for an artificial superintelligance

Eliezer Yudkowsky11mo135

Lacking time right now for a long reply: The main thrust of my reaction is that this seems like a style of thought which would have concluded in 2008 that it's incredibly unlikely for superintelligences to be able to solve the protein folding problem. People did, in fact, claim that to me in 2008. It furthermore seemed to me in 2008 that protein structure prediction by superintelligence was the hardest or least likely step of the pathway by which a superintelligence ends up with nanotech; and in fact I argued only that it'd be solvable for chosen special cases of proteins rather than biological proteins because the special-case proteins could be chosen to have especially predictable pathways. All those wobbles, all those balanced weak forces and local strange gradients along potential energy surfaces! All those nonequilibrium intermediate states, potentially with fragile counterfactual dependencies on each interim stage of the solution! If you were gonna be a superintelligence skeptic, you might have claimed that even chosen special cases of protein folding would be unsolvable. The kind of argument you are making now, if you thought this style of thought was a good idea, would have led you to proclaim that probably a superintelligence could not solve biological protein folding and that AlphaFold 2 was surely an impossibility and sheer wishful thinking.

If you'd been around then, and said, "Pre-AGI ML systems will be able to solve general biological proteins via a kind of brute statistical force on deep patterns in an existing database of biological proteins, but even superintelligences will not be able to choose special cases of such protein folding pathways to design de novo synthesis pathways for nanotechnological machinery", it would have been a very strange prediction, but you would now have a leg to stand on. But this, I most incredibly doubt you would have said - the style of thinking you're using would have predicted much more strongly, in 2008 when no such thing had been yet observed, that pre-AGI ML could not solve biological protein folding in general, than that superintelligences could not choose a few special-case solvable de novo folding pathways along sharper potential energy gradients and with intermediate states chosen to be especially convergent and predictable.

GPTs are Predictors, not Imitators

Eliezer Yudkowsky1y2-2

From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs.

I didn't say that GPT's task is harder than any possible perspective on a form of work you could regard a human brain as trying to do; I said that GPT's task is harder than being an actual human; in other words, being an actual human is not enough to solve GPT's task.

My Objections to "We’re All Gonna Die with Eliezer Yudkowsky"

Eliezer Yudkowsky1y96

Choosing to engage with an unscripted unrehearsed off-the-cuff podcast intended to introduce ideas to a lay audience, continues to be a surprising concept to me. To grapple with the intellectual content of my ideas, consider picking one item from "A List of Lethalities" and engaging with that.

AI ALIGNMENT FORUM
AF

Posts

Wiki Contributions

Comments