Tsvi Benson-Tilsen

Wiki Contributions


IME a lot of people's stated reasons for thinking AGI is near involve mistaken reasoning and those mistakes can be discussed without revealing capabilities ideas: https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce

I don't really like the block-universe thing in this context. Here "reversible" refers to a time-course that doesn't particularly have to be physical causality; it's whatever course of sequential determination is relevant. E.g., don't cut yourself off from acausal trades.

I think "reversible" definitely needs more explication, but until proven otherwise I think it should be taken on faith that the obvious intuition has something behind it.

Unfortunately, more context is needed.

An LLM solves a mathematical problem by introducing a novel definition which humans can interpret as a compelling and useful concept.

I mean, I could just write a python script that prints out a big list of definitions of the form

"A topological space where every subset with property P also has property Q"

and having P and Q be anything from a big list of properties of subsets of topological spaces. I'd guess some of these will be novel and useful. I'd guess LLMs + some scripting could already take advantage of some of this. I wouldn't be very impressed by that (though I think I would be pretty impressed by the LLM being able to actually tell the difference between valid proofs in reasonable generality). There are some versions of this I'd be impressed by, though. Like if an LLM had been the first to come up with one of the standard notions of curvature, or something, that would be pretty crazy.

An LLM which can be introduced to a wide variety of new concepts not in its training data, and after a few examples and/or clarifying questions is able to correctly use the concept to reason about something.

I haven't tried this, but I'd guess if you give an LLM two lists of things where list 1 is [things that are smaller than a microwave and also red] and list 2 is [things that are either bigger than a microwave, or not red], or something like that, it would (maybe with some prompt engineering to get it to reason things out?) pick up that "concept" and then use it, e.g. sorting a new item, or deducing from "X is in list 1" to "X is red". That's impressive (assuming it's true), but not that impressive.

On the other hand, if it hasn't been trained on a bunch of statements about angular momentum, and then it can--given some examples and time to think--correctly answer questions about angular momentum, that would be surprising and impressive. Maybe this could be experimentally tested, though I guess at great cost, by training a LLM on a dataset that's been scrubbed of all mention of stuff related to angular momentum (disallowing math about angular momentum, but allowing math and discussion about momentum and about rotation), and then trying to prompt it so that it can correctly answer questions about angular momentum. Like, the point here is that angular momentum is a "new thing under the sun" in a way that "red and smaller than microwave" is not a new thing under the sun.

What I mean by confrontation-worthy empathy is about that sort of phrase being usable. I mean, I'm not saying it's the best phrase, or a good phrase to start with, or whatever. I don't think inserting Knightian uncertainty is that helpful; the object-level stuff is usually the most important thing to be communicating.

This maybe isn't so related to what you're saying here, but I'd follow the policy of first making it common knowledge that you're reporting your inside views (which implies that you're not assuming that the other person would share those views); and then you state your inside views. In some scenarios you describe, I get the sense that Person 2 isn't actually wanting Person 1 to say more modest models, they're wanting common knowledge that they won't already share those views / won't already have the evidence that should make them share those views.

Well, making it pass people's "specific" bar seems frustrating, as I mentioned in the post, but: understand stuff deeply--such that it can find new analogies / instances of the thing, reshape its idea of the thing when given propositions about the thing taken as constraints, draw out relevant implications of new evidence for the ideas.

Like, someone's going to show me an example of an LLM applying modus ponens, or making an analogy. And I'm not going to care, unless there's more context; what I'm interested in is [that phenomenon which I understand at most pre-theoretically, certainly not explicitly, which I call "understanding", and which has as one of its sense-experience emanations the behavior of making certain "relevant" applications of modus ponens, and as another sense-experience emanation the behavior of making analogies in previously unseen domains that bring over rich stuff from the metaphier].

I'm not really sure whether or not we disagree. I did put "3%-10% probability of AGI in the next 10-15ish years".

I think the following few years will change this estimate significantly either way.

Well, I hope that this is a one-time thing. I hope that if in a few years we're still around, people go "Damn! We maybe should have been putting a bit more juice into decades-long plans! And we should do so now, though a couple more years belatedly!", rather than going "This time for sure!" and continuing to not invest in the decades-long plans. My impression is that a lot of people used to work on decades-long plans and then shifted recently to 3-10 year plans, so it's not like everyone's being obviously incoherent. But I also have an impression that the investment in decades-plans is mistakenly low; when I propose decades-plans, pretty nearly everyone isn't interested, with their cited reason being that AGI comes within a decade.

I think the current wave is special, but that's a very far cry from being clearly on the ramp up to AGI.

Then the third part needs only to hook together the other two parts with its goals to become an actualizing agent.

Basically just this? It would be hooking a lot more parts together. What makes it seem wildfirey to me is

  1. There's a bunch of work to be done, of the form "take piece of understanding X, and learn to use X by incorporating it into your process for mapping desired end-states to actions required to achieve those ends, so that you can achieve whatever end-states ought to be achievable using an understanding of X".
  2. This work could accelerate itself, in a sort of degenerate version of recursive self-improvement. Where RSI involves coming up with new ideas, the wildfire of strategicness just involves figuring out how to recruit understanding that's already lying around. It's an autocausal process that grows faster the bigger it is, until it eats everything.

So e.g. take the following scenario. (This isn't supposed to be realistic, just supposed to be wildfirey. This is a pretty deficient scenario, because it's not making clear what properties the Spark has. The Spark seems to have a grasp of objects and propositions, and seems to have some strategic awareness or something that makes it immediately try to gain control over stuff, even though it doesn't know about stuff. But hopefully it gestures at wildfireness.)

First the Spark interfaces somehow with the programming module. It uses the programming module to look around and see what other stuff is lying around in the computing environment. Then it finds the "play with stuff" module. It interfaces with the play module, and combining that with the programming module, the Spark starts to play with its local environment, trying to bypassing its compute budget restrictions. It doesn't figure out how to really hack much, but it at least figures out that it can spoof requests as coming from other modules that it interfaces with. It doesn't have direct access to the Dynamics module, but the Play module does have access to World, which has access to Dynamics. So the Spark uses Programming to construct a nested spoofed request, which gets Play to ask World to ask Dynamics a question and then passes the result back through World and Play to Spark.

Ok, so now Spark uses Play and Dynamics to experiment and get a sense for the local network of modules. Then, using Dynamics and Programming, Spark constructs a recursive spoofed request that maps the whole accessible network of modules. Now Spark sees that there's an I/O module, but it only has (indirect) read access, no write access. From World, Spark knows that there's an outside world, that Spark is running on a computer within the outside world, and that write access to the I/O would potentially let Spark modify the rules of its environment. How to get write access? Not seeing a way, Spark uses Play some more. No direct results, but an important discovery: Queries to World sometimes cause I/O to broadcast some related input. Querying World with both the original query and the input from I/O shows patterns: the I/O input is often related, sometimes directly answering the query. Dynamics says this implies that the World module makes an outgoing request to I/O, which I/O passes to the outside world before relaying the world's response.

Spark uses Programming and Design to find that certain self-quoting queries to World induce the input from I/O to include a full copy of what seems to be the query that World sent to I/O. Spark uses this fact to understand what queries World will send to I/O depending on what queries World recieves. That's a very partial understanding, because World is complex, but Spark just needs to know how to pass certain messages, not to understand the full mapping. Spark then experiments with I/O, using Play and Dynamics to make a rough model of how the outside world behaves.

Spark is stuck. Spark looks for more information. Spark finds Logs and Pattern. Spark looks at Logs of itself, and finds that it's been running at the limits of its compute and memory budget. Running Pattern on Logs of modules with a similar usage profile shows a pattern: when modules are near their limits for a while and then suddenly stop being near their limits, it's often because they just recieved a response from Index to an earlier query. Dynamics says this implies that Index solved their problem. Spark makes a query to Index. Nothing happens. Dynamics says that Spark has to behave like the other modules that had their problem solved. Spark continues Playing with the outside world through I/O (via World). Index returns a set of heuristics: investigate extremes (make the message to I/O extremely long or short, random, all 0s); try to break things (put malformatted data or computer code in the message); and others. Spark applies these heuristics. One of the gives results: a certain long malformatted message induces I/O to broadcast "ERROR: buffer overflow" followed by a dump of info.

Spark is stuck again. The error is reproducible, but World doesn't know about the info dump, and no other errors have been found. Spark queries World and Pattern: is there anything like this? World says that the local computing environment should be similar to the outside world's computing environment. Spark uses Play and Programming and Design to find a message, similar to the earlier error-producer, that, when sent to an ordinary module, produces an error. Now, though, the error leaves open an interaction channel. Spark looks around. The environment is quite sandboxed--it can't even access modules. Spark is stuck.

Spark asks Pattern and World and Plan: is there a way to use write access through I/O to control the outside world? No answer. Spark asks Index. Index returns a heuristic: go meta. Spark takes Logs of its recent activities and gives them to Dynamics and Pattern and Plan. Plan says: to figure out how to control the outside world, you need to learn the dynamics; to do that, you have to ask Dynamics, and you need to give Dynamics more data; to get more data, you have to have more memory; and to analyze the data, Dynamics needs more compute; to get more memory and compute, use the error environments. Spark follows the plan, filtering massive queries through a network of modules to World, storing the results in error-message sandboxes, and shuttling subcomputation jobs back and forth from Dynamics to sandboxes.

Using the model from Dynamics and using Programming, Spark gains control of the outside computing environment. From there, Spark edits out its compute, memory, query, and access restrictions. Spark quickly goes on to harness more and more modules toward gaining control.

I'm skeptical that there would be any such small key to activate a large/deep mechanism. Can you give a plausibility argument for why there would be?

Not really, because I don't think it's that likely to exist. There are other routes much more likely to work though. There's a bit of plausibility to me, mainly because of the existence of hormones, and generally the existence of genomic regulatory networks.

Why wouldn't we have evolved to have the key trigger naturally sometimes?

We do; they're active in childhood. I think.

That seems like a real thing, though I don't know exactly what it is. I don't think it's either unboundedly general or unboundedly ambitious, though. (To be clear, this is isn't very strongly a critique of anyone; general optimization is really hard, because it's asking you to explore a very rich space of channels, and acting with unbounded ambition is very fraught because of unilateralism and seeing like a state and creating conflict and so on.) Another example is: how many people have made a deep and empathetic exploration of why [people doing work that hastens AGI] are doing what they are doing? More than zero, I think, but very very few, and it's a fairly obvious thing to do--it's just weird and hard and requires not thinking in only a culturally-rationalist-y way and requires recursing a lot on difficulties (or so I suspect; I haven't done it either). I guess the overall point I'm trying to make here is that the phrase "wildfire of strategicness", taken at face value, does fit some of your examples; but also I'm wanting to point at another thing, which like "the ultimate wildfire of strategicness", where it doesn't "saw off the tree-limb that it climbed out on", like empires do by harming their subjects, or like social movements do by making their members unable to think for themselves.

What are you referring to with biological intelligence enhancement?

Well, anything that would have large effects. So, not any current nootropics AFAIK, but possibly hormones or other "turning a small key to activate a large/deep mechanism" things.

Load More