Eliezer Yudkowsky

Wiki Contributions


At the superintelligent level there's not a binary difference between those two clusters.  You just compute each thing you need to know efficiently.

Lacking time right now for a long reply:  The main thrust of my reaction is that this seems like a style of thought which would have concluded in 2008 that it's incredibly unlikely for superintelligences to be able to solve the protein folding problem.  People did, in fact, claim that to me in 2008.  It furthermore seemed to me in 2008 that protein structure prediction by superintelligence was the hardest or least likely step of the pathway by which a superintelligence ends up with nanotech; and in fact I argued only that it'd be solvable for chosen special cases of proteins rather than biological proteins because the special-case proteins could be chosen to have especially predictable pathways.  All those wobbles, all those balanced weak forces and local strange gradients along potential energy surfaces!  All those nonequilibrium intermediate states, potentially with fragile counterfactual dependencies on each interim stage of the solution!  If you were gonna be a superintelligence skeptic, you might have claimed that even chosen special cases of protein folding would be unsolvable.  The kind of argument you are making now, if you thought this style of thought was a good idea, would have led you to proclaim that probably a superintelligence could not solve biological protein folding and that AlphaFold 2 was surely an impossibility and sheer wishful thinking.

If you'd been around then, and said, "Pre-AGI ML systems will be able to solve general biological proteins via a kind of brute statistical force on deep patterns in an existing database of biological proteins, but even superintelligences will not be able to choose special cases of such protein folding pathways to design de novo synthesis pathways for nanotechnological machinery", it would have been a very strange prediction, but you would now have a leg to stand on.  But this, I most incredibly doubt you would have said - the style of thinking you're using would have predicted much more strongly, in 2008 when no such thing had been yet observed, that pre-AGI ML could not solve biological protein folding in general, than that superintelligences could not choose a few special-case solvable de novo folding pathways along sharper potential energy gradients and with intermediate states chosen to be especially convergent and predictable.

From a high-level perspective, it is clear that this is just wrong. Part of what human brains are doing is to minimise prediction error with regard to sensory inputs

I didn't say that GPT's task is harder than any possible perspective on a form of work you could regard a human brain as trying to do; I said that GPT's task is harder than being an actual human; in other words, being an actual human is not enough to solve GPT's task.

Choosing to engage with an unscripted unrehearsed off-the-cuff podcast intended to introduce ideas to a lay audience, continues to be a surprising concept to me.  To grapple with the intellectual content of my ideas, consider picking one item from "A List of Lethalities" and engaging with that.

The author doesn't seem to realize that there's a difference between representation theorems and coherence theorems.

The Complete Class Theorem says that an agent’s policy of choosing actions conditional on observations is not strictly dominated by some other policy (such that the other policy does better in some set of circumstances and worse in no set of circumstances) if and only if the agent’s policy maximizes expected utility with respect to a probability distribution that assigns positive probability to each possible set of circumstances.

This theorem does refer to dominated strategies. However, the Complete Class Theorem starts off by assuming that the agent’s preferences over actions in sets of circumstances satisfy Completeness and Transitivity. If the agent’s preferences are not complete and transitive, the Complete Class Theorem does not apply. So, the Complete Class Theorem does not imply that agents must be representable as maximizing expected utility if they are to avoid pursuing dominated strategies.

Cool, I'll complete it for you then.

Transitivity:  Suppose you prefer A to B, B to C, and C to A.  I'll keep having you pay a penny to trade between them in a cycle.  You start with C, end with C, and are three pennies poorer.  You'd be richer if you didn't do that.

Completeness:  Any time you have no comparability between two goods, I'll swap them in whatever direction is most useful for completing money-pump cycles.  Since you've got no preference one way or the other, I don't expect you'll be objecting, right?

Combined with the standard Complete Class Theorem, this now produces the existence of at least one coherence theorem.  The post's thesis, "There are no coherence theorems", is therefore falsified by presentation of a counterexample.  Have a nice day!

I see several large remaining obstacles.  On the one hand, I'd expect vast efforts thrown at them by ML to solve them at some point, which, at this point, could easily be next week.  On the other hand, if I naively model Earth as containing locally-smart researchers who can solve obstacles, I would expect those obstacles to have been solved by 2020.  So I don't know how long they'll take.

(I endorse the reasoning of not listing out obstacles explicitly; if you're wrong, why talk, if you're right, you're not helping.  If you can't save your family, at least don't personally contribute to killing them.)

Expanding on this now that I've a little more time:

Although I haven't had a chance to perform due diligence on various aspects of this work, or the people doing it, or perform a deep dive comparing this work to the current state of the whole field or the most advanced work on LLM exploitation being done elsewhere,

My current sense is that this work indicates promising people doing promising things, in the sense that they aren't just doing surface-level prompt engineering, but are using technical tools to find internal anomalies that correspond to interesting surface-level anomalies, maybe exploitable ones, and are then following up on the internal technical implications of what they find.

This looks to me like (at least the outer ring of) security mindset; they aren't imagining how things will work well, they are figuring out how to break them and make them do much weirder things than their surface-apparent level of abnormality.  We need a lot more people around here figuring out things will break.   People who produce interesting new kinds of AI breakages should be cherished and cultivated as a priority higher than a fair number of other priorities.

In the narrow regard in which I'm able to assess this work, I rate it as scoring very high on an aspect that should relate to receiving future funding.  If anyone else knows of a reason not to fund the researchers who did this, like a low score along some metric I didn't examine, or because this is somehow less impressive as a feat of anomaly-finding than it looks, please contact me including via email or LW direct message; as otherwise I might run around scurrying trying to arrange funding for this if it's not otherwise funded.

If it's a mistake you made over the last two years, I have to say in your defense that this post didn't exist 2 years ago.

Load More