Wiki Contributions


BTW: the way I found that first link was by searching the title on google scholar, finding the paper, and clicking "All 5 versions" below (it's right next to "Cited by 7" and "Related articles"). That brought me to a bunch of versions, one of which was a seemingly-ungated PDF. This will probably frequently work, because AI researchers usually make their papers publicly available (at least in pre-print form).

To tie up this thread: I started writing a more substantive response to a section but it took a while and was difficult and I then got invited to dinner, so probably won't get around to actually writing it.

I don't want to get super hung up on this because it's not about anything Yudkowsky has said but:

Consider the whole transformed line of reasoning:

avian flight comes from a lot of factors; you can't just ape one of the factors and expect the rest to follow; to get an entity which flies, that entity must be as close to a bird as birds are to each other.

IMO this is not a faithful transformation of the line of reasoning you attribute to Yudkowsky, which was:

human intelligence/alignment comes from a lot of factors; you can't just ape one of the factors and expect the rest to follow; to get a mind which wants as humans do, that mind must be as close to a human as humans are to each other.

Specifically, where you wrote "an entity which flies", you were transforming "a mind which wants as humans do", which I think should instead be transformed to "an entity which flies as birds do". And indeed planes don't fly like birds do. [EDIT: two minutes or so after pressing enter on this comment, I now see how you could read it your way]

I guess if I had to make an analogy I would say that you have to be pretty similar to a human to think the way we do, but probably not to pursue the same ends, which is probably the point you cared about establishing.

This is a valid point, and that's not what I'm critiquing. I'm critiquing how he confidently dismisses ANNs

I guess I read that as talking about the fact that at the time ANNs did not in fact really work. I agree he failed to predict that would change, but that doesn't strike me as a damning prediction.

Matters would be different if he said in the quotes you cite "you only get these human-like properties by very exactly mimicking the human brain", but he doesn't.

Didn't he? He at least confidently rules out a very large class of modern approaches.

Confidently ruling out a large class of modern approaches isn't really that similar to saying "the only path to success is exactly mimicking the human brain". It seems like one could rule them out by having some theory about why they're deficient. I haven't re-read List of Lethalities because I want to go to sleep soon, but I searched for "brain" and did not find a passage saying "the real problem is that we need to emulate the brain precisely but can't because of poor understanding of neuroanatomy" or something.

This comment doesn't really engage much with your post - there's a lot there and I thought I'd pick one point to get a somewhat substantive disagreement. But I ended up finding this question and thought that I should answer it.

But have you ever, even once in your life, thought anything remotely like "I really like being able to predict the near-future content of my visual field. I should just sit in a dark room to maximize my visual cortex's predictive accuracy."?

I think I've been in situations where I've been disoriented by a bunch of random stuff happening and wished that less of it was happening so that I could get a better handle on stuff. An example I vividly recall was being in a history class in high school and being very bothered by the large number of conversations happening around me.

I don't really get your comment. Here are some things I don't get:

  • In "Failure By Analogy" and "Surface Analogies and Deep Causes", the point being made is "X is similar in aspects A to thing Y, and X has property P" does not establish "Y has property P". The reasoning he instead recommends is to reason about Y itself, and sometimes it will have property P. This seems like a pretty good point to me.
  • Large ANNs don't appear to me to be intelligent because of their similarity to human brains - they appear to me to be intelligent because they're able to be tuned to accurately predict simple facts about a large amount of data that's closely related to human intelligence, and the algorithm they get tuned to seems to be able to be repurposed for a wide variety of tasks (probably related to the wide variety of data that was trained on).
  • Airplanes don't fly like birds, they fly like airplanes. So indeed you can't just ape one thing about birds[*] to get avian flight. I don't think this is a super revealing technicality but it seemed like you thought it was important.
  • Maybe most importantly I don't think Eliezer thinks you need to mimic the human brain super closely to get human-like intelligence with human-friendly wants. I think he instead thinks you need to mimic the human brain super closely to validly argue by analogy from humans. I think this is pretty compatible with this quote from "Failure By Analogy" (it isn't exactly implied by it, but your interpretation isn't either):

An abacus performs addition; and the beads of solder on a circuit board bear a certain surface resemblance to the beads on an abacus. Nonetheless, the circuit board does not perform addition because we can find a surface similarity to the abacus. The Law of Similarity and Contagion is not relevant. The circuit board would work in just the same fashion if every abacus upon Earth vanished in a puff of smoke, or if the beads of an abacus looked nothing like solder. A computer chip is not powered by its similarity to anything else, it just is. It exists in its own right, for its own reasons.

The Wright Brothers calculated that their plane would fly - before it ever flew - using reasoning that took no account whatsoever of their aircraft's similarity to a bird. They did look at birds (and I have looked at neuroscience) but the final calculations did not mention birds (I am fairly confident in asserting). A working airplane does not fly because it has wings "just like a bird". An airplane flies because it is an airplane, a thing that exists in its own right; and it would fly just as high, no more and no less, if no bird had ever existed.

  • Matters would be different if he said in the quotes you cite "you only get these human-like properties by very exactly mimicking the human brain", but he doesn't.

[*] I've just realized that I can't name a way in which airplanes are like birds in which they aren't like humans. They have things sticking out their sides? So do humans, they're called arms. Maybe the cross-sectional shape of the wings are similar? I guess they both have pointy-ish bits at the front, that are a bit more pointy than human heads? TBC I don't think this footnote is at all relevant to the safety properties of RLHF'ed big transformers.

I no longer endorse this claim about what the orthogonality thesis says.

But given that good, automated mechanistic hypothesis generation seems to be the only hope for scalable MI, it may be time for TAISIC to work on this in earnest. Because of this, I would argue that automating the generation of mechanistic hypotheses is the only type of MI work TAISIC should prioritize at this point in time.

"Automating" seems like a slightly too high bar here, given how useful human thoughts are for things. IMO, a better frame is that we have various techniques for combining human labour and algorithmic computation to generate hypotheses about networks of different sizes, and we want the amount of human labour required to be sub-polynomial in network size (e.g. constant or log(n)).

Load More