AI ALIGNMENT FORUM
AF

197
Tsvi Benson-Tilsen
Ω558406191
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
4TsviBT's Shortform
1y
1
Buck's Shortform
TsviBT24d64

Have you stated anywhere what makes you think "apparently a village idiot" is a sensible description of current learning programs, as they inform us regarding the question of whether or not we currently have something that is capable via generators sufficiently similar to [the generators of humanity's world-affecting capability] that we can reasonably induce that these systems are somewhat likely to kill everyone soon?

Reply
Buck's Shortform
TsviBT26d*69

If by intelligence you mean "we made some tests and made sure they are legible enough that people like them as benchmarks, and lo and behold, learning programs (LPs) continue to perform some amount better on them as time passes", ok, but that's a dumb way to use that word. If by intelligence you mean "we have something that is capable via generators sufficiently similar to [the generators of humanity's world-affecting capability] that we can reasonably induce that these systems are somewhat likely to kill everyone", then I challenge you to provide the evidence / reasoning that apparently makes you confident that LP25 is at a ~human (village idiot) level of intelligence.

Cf. https://www.lesswrong.com/posts/5tqFT3bcTekvico4d/do-confident-short-timelines-make-sense

Reply
Views on when AGI comes and on strategy to reduce existential risk
TsviBT2mo20

I think I don't understand this argument. In creating AI we can draw on training data, which breaks the analogy to making a replicator actually from scratch (are you using a premise that this is a dead end, or something, because "Nearly all [thinkers] do not write much about the innards of their thinking processes..."?).

You're technically right that the analogy is broken in that way, yeah. Likewise, if someone gleans substantial chunks of the needed Architecture by looking at scans of brains. But yes, as you say, I think the actual data (in both cases) doesn't directly tell you what you need to know, by any stretch. (To riff on an analogy from Kabir Kumar: it's sort of like trying to infer the inner workings of a metal casting machine, purely by observing price fluctuations for various commodities. It's probably possible in theory, but staring at the price fluctuations--which are a highly mediated / garbled / fuzzed emanation from the "guts" of various manufacturing processes--is not a good way to discover the important ideas about how casting machines can work. Cf. https://www.lesswrong.com/posts/unCG3rhyMJpGJpoLd/koan-divining-alien-datastructures-from-ram-activations )

We've seen that supervised learning and RL (and evolution) can create structural richness (if I have the right idea of what you mean) out of proportion to the understanding that went into them.

Not sure I buy the claims about SL and RL. In the case of SL, it's only going "a little ways away from the data", in terms of the structure you get. Or so I claim uncertainly. (Hm... maybe the metaphor of "distance from the data" is quite bad.... really I mean "it's only exploring a pretty impoverished sector in structurespace, partly due to data and partly due to other Architecture".) In the case of RL, what are the successes in terms of gaining new learned structure? There's going to be some--we can point to AlphaZero, and maybe some robotics things--but I'm skeptical that this actually represents all that much structural richness. The actual NNs in AlphaZero would have some nontrivial structure, but hard to tell how much, and it's going to be pretty narrow / circumscribed, e.g. it wouldn't represent most interesting math concepts.

Anyway, the claim is of course true of evolution. The general point is true, that learning systems can be powerful, and specifically high-leverage in various ways (e.g. lots of learning from small algorithmic complexity fingerprint as with evolution or Solomonoff induction, or from fairly small compute as in humans).

Of course this doesn't mean any particular learning process is able to create a strong mind, but, idk, I don't see a way to put a strong lower bound on how much more powerful a learning process is necessary,

Right, no one knows. Could be next month that everyone dies from AGI. The only claims I'd really argue strongly would be claims like

  • If you have median 2029 or similar, either you're overconfident or you know something dispositive that I don't know.
  • If you have probability of AGI by 2029 less than .05%, either you're overconfident or you know something dispositive that I don't know.

Besides my comments about the bitter lesson and about the richness of evolution's search, I'll also say that it just seems to me like there's lots of ideas--at the abstract / fundamental / meta level of learning and thinking--that have yet to be put into practice in AI. I wrote in the OP:

The self-play that evolution uses (and the self-play that human children use) is much richer, containing more structural ideas, than the idea of having an agent play a game against a copy of itself.

IME if you think about these sorts of things--that is, if you think about how the 2.5 known great and powerful optimization processes (evolution, humans, humanity/science) do their impressive thing that they do--if you think about that, you see lots of sorts of feedback arrangements and ways of exploring the space of structures / algorithms, many of which are different in some fundamental character from what's been tried so far in AI. And, these things don't add up, in my head, to a general intelligence--though of course that is only a deficiency in my imagination, one way or another.

(EDIT: Maybe (you'd say) I should be drawing such a strong lower bound from the point about sample efficiency...?)

I don't personally lean super heavily on the sample efficiency thing. I mean, if we see a system that's truly only trained on some human data that's of size less than 10x the amount that a well-read adult human has read (plus compute / thinking), and it performs like GPT-4 or similar, that would be really weird and surprising, and I would be confused, and I'd be somewhat more scared. But I don't think it would necessarily imply that you're about to get AGI.

Conversely, I definitely don't think that high sample complexity strongly implies that you're not about to get AGI. (Well, I guess if you're about to get AGI, there should probably be spikes in sample efficiency in specific areas--e.g. you'd be able to invent much more interesting math with little or no data, whereas previously you had to train on vast math corpora. But we don't necessarily have to observe these domain spikes before dying of nanopox.)

Yeah, in particular it seems like I'm updating more than you from induction on the conceptual-progress-to-capabilities ratio we've seen so far / on what seem like surprises to the 'we need lots of ideas' view. (Or maybe you disagree about observations there, or disagree with that frame.) (The "missing update" should weaken this induction, but doesn't invalidate it IMO.)

Yeah... To add a bit of color, I'd say I'm pretty wary of mushing. Like, we mush together all "capabilities" and then update on how much "capabilities" our current learning programs have. I don't feel like that sort of reasoning ought to work very well. But I haven't yet articulated how mushing is anything more specific than categorization, if it is more specific. Maybe what I mean by mushing is "sticking to a category and hanging lots of further cognition (inferences, arguments, plans) on the category, without putting in suitable efforts to refine the category into subcategories". I wrote:

We should have been trying hard to retrospectively construct new explanations that would have predicted the observations. Instead we went with the best PREEXISTING explanation that we already had.

Reply1
Mech interp is not pre-paradigmatic
TsviBT3mo32

So, as a field, we don't have to be happy with the dominant paradigm. But just because we're not happy with it doesn't mean it's not there.

Um, ok fine, so what alternative term do you propose to replace "pre-paradigmatic" as it is currently used, to indicate that there's no remotely satisfactory paradigm in which to get going on the parts of the field-to-be that really matter?

Reply
A single principle related to many Alignment subproblems?
TsviBT4mo85

By default, humans only care about variables they could (in principle) easily optimize or comprehend.

I think this is incorrect. I think humans have values which are essentially provisional. In other words, they're based on pointers which are supposed to be impossible to fully dereference. Examples:

  1. Friendship--pointing at another mind, who you never fully comprehend, who can always surprise you--which is part of the point
  2. Boredom / fun--pointing at surprise, novelty, diagonalizing against what you already understand
Reply1
orthonormal's Shortform
TsviBT5mo50

See Jessica's comment. Yeah it's primitive recursive assuming that your deductive process is primitive recursive. (Also assuming that your traders are primitive recursive; e.g. if they are polytime as in the paper.) There's probably some other parameters not necessarily set in the implementation described in the paper, e.g. the enumerator of trader-machines, but you can make those primrec.

Reply1
evhub's Shortform
TsviBT8mo124

(Interesting. FWIW I've recently been thinking that it's a mistake to think of this type of thing--"what to do after the acute risk period is safed"--as being a waste of time / irrelevant; it's actually pretty important, specifically because you want people trying to advance AGI capabilities to have an alternative, actually-good vision of things. A hypothesis I have is that many of them are in a sense genuinely nihilistic/accelerationist; "we can't imagine the world after AGI, so we can't imagine it being good, so it cannot be good, so there is no such thing as a good future, so we cannot be attached to a good future, so we should accelerate because that's just what is happening".)

Reply
Views on when AGI comes and on strategy to reduce existential risk
TsviBT8mo*41

really smart people

Differences between people are less directly revelative of what's important in human intelligence. My guess is that all or very nearly all human children have all or nearly all the intelligence juice. We just, like, don't appreciate how much a child is doing in constructing zer world.

the current models have basically all the tools a moderately smart human have, with regards to generating novel ideas

Why on Earth do you think this? (I feel like I'm in an Asch Conformity test, but with really really high production value. Like, after the experiment, they don't tell you what the test was about. They let you take the card home. On the walk home you ask people on the street, and they all say the short line is long. When you get home, you ask your housemates, and they all agree, the short line is long.)

I don't see what's missing that a ton of training on a ton of diverse, multimodal tasks + scaffoldin + data flywheel isn't going to figure out.

My response is in the post.

Reply
Views on when AGI comes and on strategy to reduce existential risk
TsviBT8mo51

I'm curious if you have a sense from talking to people.

More recently I've mostly disengaged (except for making kinda-shrill LW comments). Some people say that "concepts" aren't a thing, or similar. E.g. by recentering on performable tasks, by pointing to benchmarks going up and saying that the coarser category of "all benchmarks" or similar is good enough for predictions. (See e.g. Kokotajlo's comment here https://www.lesswrong.com/posts/oC4wv4nTrs2yrP5hz/what-are-the-strongest-arguments-for-very-short-timelines?commentId=QxD5DbH6fab9dpSrg, though his actual position is of course more complex and nuanced.) Some people say that the training process is already concept-gain-complete. Some people say that future research, such as "curiosity" in RL, will solve it. Some people say that the "convex hull" of existing concepts is already enough to set off FURSI (fast unbounded recursive self-improvement).

(though I feel confused about how to update on the conjunction of those, and the things LLMs are good at — all the ways they don't behave like a person who doesn't understand X, either, for many X.)

True; I think I've heard some various people discussing how to more precisely think of the class of LLM capabilities, but maybe there should be more.

if that's less sample-efficient than what humans are doing, it's not apparent to me that it can't still accomplish the same things humans do, with a feasible amount of brute force

It's often awkward discussing these things, because there's sort of a "seeing double" that happens. In this case, the "double" is:

"AI can't FURSI because it has poor sample efficiency...

  1. ...and therefore it would take k orders of magnitude more data / compute than a human to do AI research."
  2. ...and therefore more generally we've not actually gotten that much evidence that the AI has the algorithms which would have caused both good sample efficiency and also the ability to create novel insights / skills / etc."

The same goes mutatis mutandis for "can make novel concepts".

I'm more saying 2. rather than 1. (Of course, this would be a very silly thing for me to say if we observed the gippities creating lots of genuine novel useful insights, but with low sample complexity (whatever that should mean here). But I would legit be very surprised if we soon saw a thing that had been trained on 1000x less human data, and performs at modern levels on language tasks (allowing it to have breadth of knowledge that can be comfortably fit in the training set).)

can't still accomplish the same things humans do

Well, I would not be surprised if it can accomplish a lot of the things. It already can of course. I would be surprised if there weren't some millions of jobs lost in the next 10 years from AI (broadly, including manufacturing, driving, etc.). In general, there's a spectrum/space of contexts / tasks, where on the one hand you have tasks that are short, clear-feedback, and common / stereotyped, and not that hard; on the other hand you have tasks that are long, unclear-feedback, uncommon / heterogenous, and hard. The way humans do things is that we practice the short ones in some pattern to build up for the variety of long ones. I expect there to be a frontier of AIs crawling from short to long ones. I think at any given time, pumping in a bunch of brute force can expand your frontier a little bit, but not much, and it doesn't help that much with more permanently ratcheting out the frontier.

AI that's narrowly superhuman on some range of math & software tasks can accelerate research

As you're familiar with, if you have a computer program that has 3 resources bottlenecks A (50%), B (25%), and C (25%), and you optimize the fuck out of A down to ~1%, you ~double your overall efficiency; but then if you optimize the fuck out of A again down to .1%, you've basically done nothing. The question to me isn't "does AI help a significant amount with some aspects of AI research", but rather "does AI help a significant and unboundedly growing amount with all aspects of AI research, including the long-type tasks such as coming up with really new ideas".

AI is transformative enough to motivate a whole lot of sustained attention on overcoming its remaining limitations

This certainly makes me worried in general, and it's part of why my timelines aren't even longer; I unfortunately don't expect a large "naturally-occurring" AI winter.

seems bizarre if whatever conceptual progress is required takes multiple decades

Unfortunately I haven't addressed your main point well yet... Quick comments:

  • Strong minds are the most structurally rich things ever. That doesn't mean they have high algorithmic complexity; obviously brains are less algorithmically complex than entire organisms, and the relevant aspects of brains are presumably considerably simpler than actual brains. But still, IDK, it just seems weird to me to expect to make such an object "by default" or something? Craig Venter made a quasi-synthetic lifeform--but how long would it take us to make a minimum viable unbounded invasive organic replicator actually from scratch, like without copying DNA sequences from existing lifeforms?
  • I think my timelines would have been considered normalish among X-risk people 15 years ago? And would have been considered shockingly short by most AI people.
  • I think most of the difference is in how we're updating, rather than on priors? IDK.
Reply1
Views on when AGI comes and on strategy to reduce existential risk
TsviBT8mo20

It's a good question. Looking back at my example, now I'm just like "this is a very underspecified/confused example". This deserves a better discussion, but IDK if I want to do that right now. In short the answer to your question is

  • I at least would not be very surprised if gippity-seek-o5-noAngular could do what I think you're describing.
  • That's not really what I had in mind, but I had in mind something less clear than I thought. The spirit is about "can the AI come up with novel concepts", but the issue here is that "novel concepts" are big things, and their material and functioning and history are big and smeared out.

I started writing out a bunch of thoughts, but they felt quite inadequate because I knew nothing about the history of the concept of angular momentum; so I googled around a tiny little bit. The situation seems quite awkward for the angular momentum lesion experiment. What did I "mean to mean" by "scrubbed all mention of stuff related to angular momentum"--presumably this would have to include deleting all subsequent ideas that use angular moment in their definitions, but e.g. did I also mean to delete the notion of cross product?

It seems like angular momentum was worked on in great detail well before the cross product was developed at all explicitly. See https://arxiv.org/pdf/1511.07748 and https://en.wikipedia.org/wiki/Cross_product#History. Should I still expect gippity-seek-o5-noAngular to notice the idea if it doesn't have the cross product available? Even if not, what does and doesn't this imply about this decade's AI's ability to come up with novel concepts?

(I'm going to mull on why I would have even said my previous comment above, given that on reflection I believe that "most" concepts are big and multifarious and smeared out in intellectual history. For some more examples of smearedness, see the subsection here: https://tsvibt.blogspot.com/2023/03/explicitness.html#the-axiom-of-choice)

Reply
Load More
4TsviBT's Shortform
1y
1
18Koan: divining alien datastructures from RAM activations
1y
0
22A hermeneutic net for agency
2y
0
21Human wanting
2y
0
8Time is homogeneous sequentially-composable determination
2y
0
21Telopheme, telophore, and telotect
2y
6
9Fundamental question: What determines a mind's effects?
2y
2
57Views on when AGI comes and on strategy to reduce existential risk
2y
28
6The fraught voyage of aligned novelty
2y
0
4Provisionality
2y
0
Load More
Tracking
7 months ago
(+191)
Tracking
7 months ago
(+2/-2)
Tracking
7 months ago
(+1571)
Joint probability distribution
9 years ago
(+850)
Square visualization of probabilities on two events
9 years ago
(+72)