Said Achmiz — AI Alignment Forum

Cognitive Biases in Large Language Models

It’s “Novaya Zemlya” (“New Land”), not “Zovaya”.

Comments on “The Singularity is Nowhere Near”

I haven’t read the linked post/comment yet, and perhaps I am missing something very obvious, but: we have exaflop computing (that’s 10^18) right now. Is Tim Dettmers really saying that we’re not going to see a 1000x speed-up, in a century or possibly ever? That seems like a shocking claim, and I struggle to imagine what could justify it.

EDIT: I have now read the linked comment; it speaks of fundamental physical limitations such as speed of light, heat dissipation, etc., and says:

These are all hard physical boundaries that we cannot alter. Yet, all these physical boundaries will be hit within a couple of years and we will fall very, very far short of human processing capabilities and our models will not improve much further. Two orders of magnitude of additional capability are realistic, but anything beyond that is just wishful thinking.

I do not find this convincing. Taking the outside view, we can see all sorts of similar predictions of limitations having been made over the course of computing history, and yet Moore’s Law is still going strong despite quite a few years of predictions of imminent trend-crashing. (Take a look at the “Recent trends” and “Alternative materials research” sections of the Wikipedia page; do you really see any indication that we’re about to hit a hard barrier? I don’t…)

Specification gaming examples in AI

Said Achmiz7y50

These are great (and terrifying).

It’s hard to pick just one favorite, but I think I’ll go with that amazing last entry:

We noticed that our agent discovered an adversarial policy to move around in such a way so that the monsters in this virtual environment governed by the M model never shoots a single fireball in some rollouts. Even when there are signs of a fireball forming, the agent will move in a way to extinguish the fireballs magically as if it has superpowers in the environment.

Literally “hacking the Matrix to gain superpowers”.

The Useful Idea of Truth

Said Achmiz7y30

Ah, I see.

As far as whether Eliezer came up with the idea on his own—as with most (though not all) of his ideas, the answer, as I understand it, is “sort of yes, sort of no”. To expand a bit: much of what Eliezer says is one or both of: (a) prefigured in the writings of other philosophers / mathematicians / etc., (b) directly inspired by some combination of things he’d read. However, the presentation, the focus, the emphasis, etc., are often novel, and the specifics may be a synthesis of multiple extant sources, etc.

In this particular case, I do not recall offhand whether Eliezer ever mentioned a specific inspiration. But as far as there being other sources for this idea—they certainly exist. You may want to start with the SEP page on the “correspondence theory of truth”, and go from there, following references and so on. (In general, the SEP will serve well as your first port of call for finding detailed accounts of, and references about, ideas in philosophy.)

The Useful Idea of Truth