Steve Byrnes

I'm Steve Byrnes, a professional physicist in the Boston area. I have a summary of my AGI safety research interests at:

Steve Byrnes' Comments

Goals and short descriptions

Hmm, maybe we're talking past each other. Let's say I have something like AlphaZero, where 50,000 bytes of machine code trains an AlphaZero-type chess-playing agent, whose core is a 1-billion-parameter ConvNet. The ConvNet takes 1 billion bytes to specify. Meanwhile, the reward-calculator p, which calculates whether checkmate has occurred, is 100 bytes of machine code.

Would you say that the complexity of the trained chess-playing agent is 100 bytes or 50,000 bytes or 1 billion bytes?

I guess you're going to say 50,000, because you're imagining a Turing machine that spends a year doing the self-play to calculate the billion-parameter ConvNet and then immediately the same Turning machine starts running that ConvNet it just calculated. From the perspective of Kolmogorov complexity, it doesn't matter that it spends a year calculating the ConvNet, as long as it does so eventually.

By the same token, you can always turn a search-y agent into an equivalent discriminitive-y agent, given infinite processing time and storage, by training the latter on a googol queries of the former. If you're thinking about Kolmogorov complexity, then you don't care about a googol queries, as long as it works eventually.

Therefore, my first comment is not really relevant to what you're thinking about. Sorry. I was not thinking about algorithms-that-write-arbitrary-code-and-then-immediately-run-it, I was thinking about the complexity of the algorithms that are actually in operation as the agent acts in the world.

If my hand touches fire and thus immediately moves backwards by reflex, would this be an example of a discriminative policy, because an input signal directly causes an action without being processed in the brain?

Yes. But the lack-of-processing-in-the-brain is not the important part. A typical ConvNet image classifier does involve many steps of processing, but is still discriminative, not search-y, because it does not work by trying out different generative models and picks the one that best explains the data. You can build a search-y image classifier that does exactly that, but most people these days don't.

Goals and short descriptions

If we're talking about algorithmic complexity, there's a really important distinction, I think. In the space of actions, we have:

  • SEARCH: Search over an action space for an action that best achieves a goal
  • DISCRIMINATIVE: There is a function that goes directly from sensory inputs to the appropriate action (possibly with a recurrent state etc.)

Likewise, in the space of passive observations (e.g. classifying images), we have:

  • SEARCH: Search over a space of generative models for the model that best reproduces the observations (a.k.a. "analysis by synthesis")
  • DISCRIMINATIVE: There is a function that goes directly from sensory inputs to understanding / classification.

The search methods are generally:

  • Algorithmically simpler
  • More sample-efficient
  • Slower at run-time

(Incidentally, I think the neocortex does the "search" option in both these cases (and that they aren't really two separate cases in the brain). Other parts of the brain do the "discriminative" option in certain cases.)

I'm a bit confused about how my comment here relates to your post. If p is the goal ("win at chess"), the simplest search-based agent is just about exactly as complicated as p ("do a minimax search to win at chess"). But RL(p), at least with the usual definition of "RL", will learn a very complicated computation that contains lots of information about which particular configurations of pieces are advantageous or not, e.g. the ResNet at the core of AlphaZero.

Are you imagining that the policy π is searching or discriminative? If the latter, why are you saying that π is just as simple as p? (Or are you saying that?) "Win at chess" seems a lot simpler than "do the calculations described by the following million-parameter ResNet", right?

[AN #104]: The perils of inaccessible information, and what we can learn about AI alignment from COVID

On the Russell / Pinker debate, I thought Pinker had an interesting rhetorical sleight-of-hand that I hadn't heard before...

When people on the "AGI safety is important" side explain their position, there's kinda a pedagogical dialog:

A: Superintelligent AGI will be awesome, what could go wrong? B: Well it could outclass all of humanity and steer the future in a bad direction. A: OK then we won't give it an aggressive goal. B: Even with an innocuous-sounding goal like "maximize paperclips" it would still kill everyone... A: OK, then we'll give it a good goal like "maximize human happiness". B: Then it would forcibly drug everyone. A: OK, then we'll give it a more complicated goal like ... B: That one doesn't work either because ...

...And then Pinker reads this back-and-forth dialog, removes a couple pieces of it from their context, and says "The existential risk scenario that people are concerned about is the paperclip scenario and/or the drugging scenario! They really think those exact things are going to happen!" Then that's the strawman that he can easily rebut.

Pinker had other bad arguments too, I just thought that was a particularly sneaky one.

Reply to Paul Christiano's “Inaccessible Information”

OK, well I spend most of my time thinking about a particular AGI architecture (1 2 etc.) in which the learning algorithm is legible and hand-coded ... and let me tell you, in that case, all the problems of AGI safety and alignment are still really really hard, including the "inaccessible information" stuff that Paul was talking about here.

If you're saying that it would be even worse if, on top of that, the learning algorithm itself is opaque, because it was discovered from a search through algorithm-space ... well OK, yeah sure, that does seem even worse.

Reply to Paul Christiano's “Inaccessible Information”

finding a solution to the design problem for intelligent systems that does not rest on a blind search for policies that satisfy some evaluation procedure

I'm a bit confused by this. If you want your AI to come up with new ideas that you hadn't already thought of, then it kinda has to do something like running a search over a space of possible ideas. If you want your AI to understand concepts that you don't already have yourself and didn't put in by hand, then it kinda has to be at least a little bit black-box-ish.

In other words, let's say you design a beautiful AGI architecture, and you understand every part of it when it starts (I'm actually kinda optimistic that this part is possible), and then you tell the AGI to go read a book. After having read that book, the AGI has morphed into a new smarter system which is closer to "black-box discovered by a search process" (where the learning algorithm itself is the search process).

Right? Or sorry if I'm being confused.

Human instincts, symbol grounding, and the blank-slate neocortex

Thanks for the comment! When I think about it now (8 months later), I have three reasons for continuing to think CCA is broadly right:

  1. Cytoarchitectural (quasi-) uniformity. I agree that this doesn't definitively prove anything by itself, but it's highly suggestive. If different parts of the cortex were doing systematically very different computations, well maybe they would start out looking similar when the differentiation first started to arise millions of years ago, but over evolutionary time you would expect them to gradually diverge into superficially-obviously-different endpoints that are more appropriate to their different functions.

  2. Narrowness of the target, sorta. Let's say there's a module that takes specific categories of inputs (feedforward, feedback, reward, prediction-error flags) and has certain types of outputs, and it systematically learns to predict the feedforward input and control the outputs according to generative models following this kind of selection criterion (or something like that). This is a very specific and very useful thing. Whatever the reward signal is, this module will construct a theory about what causes that reward signal and make plans to increase it. And this kind of module automatically tiles—you can connect multiple modules and they'll be able to work together to build more complex composite generative models integrating more inputs to make better reward predictions and better plans. I feel like you can't just shove some other computation into this system and have it work—it's either part of this coordinated prediction-and-action mechanism, or not (in which case the coordination prediction-and-action mechanism will learn to predict it and/or control it, just like it does for the motor plant etc.). Anyway, it's possible that some part of the neocortex is doing a different sort of computation, and not part of the prediction-and-action mechanism. But if so, I would just shrug and say "maybe it's technically part of the neocortex, but when I say "neocortex", I'm using the term loosely and excluding that particular part." After all, I am not an anatomical purist; I am already including part of the thalamus when I say "neocortex" for example (I have a footnote in the article apologizing for that). Sorry if this description is a bit incoherent, I need to think about how to articulate this better.

  3. Although it's probably just the Dunning-Kruger talking, I do think I at least vaguely understand what the algorithm is doing and how it works, and I feel like I can concretely see how it explains everything about human intelligence including causality, counterfactuals, hierarchical planning, task-switching, deliberation, analogies, concepts, etc. etc.

Building brain-inspired AGI is infinitely easier than understanding the brain

The human neocortical algorithm probably wouldn't work very well if it were applied in a brain 100x smaller

I disagree, as I discussed here, I think the neocortex is uniform-ish and that a cortical column in humans is doing a similar calculation as a cortical column in rats or the equivalent bundle of cells (arranged not as a column) in a bird pallium or lizard pallium. I do think you need lots and lots of cortical columns, initialized with appropriate region-to-region connections, to get human intelligence. Well, maybe that's what you meant by "human neocortical algorithm", in which case I agree. You also need appropriate subcortical signals guiding the neocortex, for example to flag human speech sounds as being important to attend to.

human intelligence minus rat intelligence is probably easier to understand and implement than rat intelligence alone..

Well, I do think that there's a lot of non-neocortical innovations between humans and rats, particularly to build our complex suite of social instincts, see here. I don't think understanding those innovations is necessary for AGI, although I do think it would be awfully helpful to understand them if we want aligned AGI. And I think they are going to be hard to understand, compared to the neocortex.

I don't think we can learn arbitrarily domains, not even close

Sure. A good example is temporal sequence learning. If a sequence of things happens, we expect the same sequence to recur in the future. In principle, we can imagine an anti-inductive universe where, if a sequence of things happens, then it's especially unlikely to recur in the future, at all levels of abstraction. Our learning algorithm would crash and burn in such a universe. This is a particular example of the no-free-lunch theorem, and I think it illustrates that, while there are domains that the neocortical learning algorithm can't learn, they may be awfully weird and unlikely to come up.

Pointing to a Flower

If you're saying that "consistent low-level structure" is a frequent cause of "recurring patterns", then sure, that seems reasonable.

Do they always go together?

  • If there are recurring patterns that are not related to consistent low-level structure, then I'd expect an intuitive concept that's not an OP-type abstraction. I think that happens: for example any word that doesn't refer to a physical object: "emotion", "grammar", "running", "cold", ...

  • If there are consistent low-level structures that are not related to recurring patterns, then I'd expect an OP-type abstraction that's not an intuitive concept. I can't think of any examples. Maybe consistent low-level structures are automatically a recurring pattern. Like, if you make a visualization in which the low-level structure(s) is highlighted, you will immediately recognize that as a recurring pattern, I guess.

Load More