## AI ALIGNMENT FORUMAF

Jan Hendrik Kirchner

phd student in comp neuroscience @ mpi brain research frankfurt. https://twitter.com/janhkirchner and https://universalprior.substack.com/

# Sequences

Simulator seminar sequence

# Comments

Hi, thanks for the response! I apologize, the "Left as an exercise" line was mine, and written kind of tongue-in-cheek. The rough sketch of the proposition we had in the initial draft did not spell out sufficiently clearly what it was I want to demonstrate here and was also (as you point out correctly) wrong in the way it was stated. That wasted people's time and I feel pretty bad about it. Mea culpa.

I think/hope the current version of the statement is more complete and less wrong. (Although I also wouldn't be shocked if there are mistakes in there). Regarding your points:

1. The limit now shows up on both sides of the equation (as it should)! The dependence on  on the RHS does actually kind of drop away at some point, but I'm not showing that here. I'd previously just sloppily substituted "chose  as a large number" and then rewrite the proposition in the way indicated at the end of the Note for Proposition 2. That's the way these large deviation principles are typically used.
2. Yeah, that should have been an  rather than a . Sorry, sloppy.
3. True. Thinking more about it now, perhaps framing the proposition in terms of "bridges" was a confusing choice; if I revisit this post again (in a month or so 🤦‍♂️) I will work on cleaning that up.

Hmm there was a bunch of back and forth on this point even before the first version of the post, with @Michael Oesterle  and @metasemi arguing what you are arguing. My motivation for calling the token the state is that A) the math gets easier/cleaner that way and B) it matches my geometric intuitions. In particular, if I have a first-order dynamical system  then  is the state, not the trajectory of states . In this situation, the dynamics of the system only depend on the current state (that's because it's a first-order system). When we move to higher-order systems, , then the state is still just , but the dynamics of the system but also the "direction from which we entered it". That's the first derivative (in a time-continuous system) or the previous state (in a time-discrete system).

At least I think that's what's going on. If someone makes a compelling argument that defuses my argument then I'm happy to concede!

Thanks for pointing this out! This argument made it into the revised version. I think because of finite precision it's reasonable to assume that such an  always exists in practice (if we also assume that the probability gets rounded to something < 1).

Technically correct, thanks for pointing that out! This comment (and the ones like it) was the motivation for introducing the "non-degenerate" requirement into the text. In practice, the proposition holds pretty well - although I agree it would nice to have a deeper understanding of when to expect the transition rule to be "non-degenerate"

This work by Michael Aird and Justin Shovelain might also be relevant: "Using vector fields to visualise preferences and make them consistent"

And I have a post where I demonstrate that reward modeling can extract utility functions from non-transitive preference orderings: "Inferring utility functions from locally non-transitive preferences"

(Extremely cool project ideas btw)

Fantastic, thank you for the pointer, learned something new today! A unique and explicit representation would be very neat indeed.

I'm pretty confused here.

Yeah, the feeling's mutual 😅 But the discussion is also very rewarding for me, thank you for engaging!

I am in favor of learning-from-scratch, and I am also in favor of specific designed inductive biases, and I don't think those two things are in opposition to each other.

A couple of thoughts:

• Yes, I agree that the inductive bias (/genetically hardcoded information) can live in different components: the learning rule, the network architecture, or the initialization of the weights. So learning-from-scratch is logically compatible with inductive biases - we can just put all the inductive bias into the learning rule and the architecture and none in the weights.
• But from the architecture and the learning rule, the hardcoded info can enter into the weights very rapidly (f.e. first step of the learning rule: set all the weights to the values appropriate for an adult brain. Or, more realistically, a ConvNet architecture can be learned from a DNN by setting a lot of connections to zero). Therefore I don't see what it could buy you to assume the weights to be free of inductive bias.
• There might also be a case that in the actual biological brain the weights are not initialized randomly. See f.e. this work on clonally related neurons.
• Something that is not appreciated a lot outside of neuroscience: "Learning" in the brain is as much a structural process as it is a "changing weights" process.  This is particularly true throughout development but also into adulthood - activity-dependent learning rules do not only adjust the weights of connections, but they can also prune bad connections and add new connections. The brain simultaneously produces activity, which induces plasticity, which changes the circuit, which produces slightly different activity in turn.

The point is, this kind of explanation does not talk about subplates and synapses, it talks about principles of algorithms and computations.

That sounds a lot more like cognitive science than neuroscience! This is completely fine (I did my undergrad in CogSci), but it requires a different set of arguments from the ones you are providing in your post, I think. If you want to make a CogSci case for learning from scratch, then your argument has to be a lot more constructive (i.e. literally walk us through the steps of how your proposed system can learn all/a lot of what humans can learn). Either you take a look at what is there in the brain (subplate, synapses, ...), describe how these things interact, and (correctly) infer that it's sufficient to produce a mind (this is the neuroscience strategy); Or you propose an abstract system, demonstrate that it can do the same thing as the mind, and then demonstrate that the components of the abstract system can be identified with the biological brain (this is the CogSci strategy). I think you're skipping step two of the CogSci strategy.

Whatever that explanation is, it's a thing that we can turn into a design spec for our own algorithms, which, powered by the same engineering principles, will do the same computations, with the same results.

I'm on board with that. I anticipate that the design spec will contain (the equivalent of) a ton of hardcoded genetic stuff also for the "learning subsystem"/cortex. From a CogSci perspective, I'm willing to assume that this genetic stuff could be in the learning rule and the architecture, not in the initial weights. From a neuroscience perspective, I'm not convinced that's the case.

is that true even if there haven't been any retinal waves?

Blocking retinal waves messes up the cortex pretty substantially (same as if the animal were born without eyes). There is the beta-2 knockout mouse, which has retinal waves but with weaker spatiotemporal correlations.  As a consequence beta-2 mice fail to track moving gratings and have disrupted receptive fields.

Here's an operationalization. Suppose someday we write computer code that can do the exact same useful computational things that the neocortex (etc.) does, for the exact same reason. My question is: Might that code look like a learning-from-scratch algorithm?

Hmm, I see. If this is the crux, then I'll put all the remaining nitpicking at the end of my comment and just say: I think I'm on board with your argument. Yes, it seems conceivable to me that a learning-from-scratch program ends up in a (functionally) very similar state to the brain. The trajectory of how the program ends up there over training probably looks different (and might take a bit longer if it doesn't use the shortcuts that the brain got from evolution), but I don't think the stuff that evolution put in the cortex is strictly necessary.

A caveat: I'm not sure how much weight the similarity between the program and the brain can support before it breaks down. I'd strongly suspect that certain aspects of the cortex are not logically implied by the statistics of the environment, but rather represent idiosyncratic quirks that were adapted at some point during evolution. Those idiosyncratic quirks won't be in the learning-from-scratch program. But perhaps (probably?) they are also not relevant in the big scheme of things.

I'm inclined to put different layer thicknesses (including agranularity) in the category of "non-uniform hyperparameters".

Fair! Most people in computational neuroscience are also very happy to ignore those differences, and so far nothing terribly bad happened.

If you buy the "locally-random pattern separation" story (Section 2.5.4), that would make it impossible for evolution to initialize the adjustable parameters in a non-locally-random way.

You point out yourself that some areas (f.e. the motor cortex) are granular, so that argument doesn't work there. But ignoring that, and conceding the cerebellum and the drosophila mushroom body to you (not my area of expertise), I'm pretty doubtful about postulating "locally-random pattern separation" in the cortex. I'm interpreting your thesis to cash out as "Given a handful of granule cells from layer 4, the connectivity with pyramidal neurons in layer 2/3 is (initially) effectively random, and therefore layer 2/3 neurons need to learn (from scratch) how to interpret the signal from layer 4". Is that an okay summary?

Because then I think this fails at three points:

1. One characteristic feature of the cortex is the presence of cortical maps. They exist in basically all sensory and motor cortices, and they have a very regular structure that is present in animal species separated by as much as 64 million years of evolution. These maps imply that if you pick a handful of granule cells from layer 4 that are located nearby, their functional properties will be somewhat similar! Therefore, even if connectivity between L4 and L2/3 is locally random it doesn't really matter since the input is somewhat similar in any case. Evolution could "use" that fact to pre-structure the circuit in L2/3.
2. Connectivity between L4 and L2/3 is not random. Projections from layer 4 are specific to different portions of the postsynaptic dendrite, and nearby synapses on mature and developing dendrites tend to share similar activation patterns. Perhaps you want to argue that this non-randomness only emerges through learning and the initial configuration is random? That's a possibility, but ...
3. ... when you record activity from neurons in the cortex of an animal that had zero visual experience prior to the experiment (lid-suture), they are still orientation-selective! And so is the topographic arrangement of retinal inputs and the segregation of eye-specific inputs. At the point of eye-opening, the animals are already pretty much able to navigate their environment.

Obviously, there are still a lot of things that need to be refined and set up during later development, but defects in these early stages of network initialization are pretty bad (a lot of neurodevelopmental disorders manifest as "wiring defects" that start in early development).

I'm very confused by this. I have coded up a ConvNet with random initialization. It was computationally tractable; in fact, it ran on my laptop!

Okay, my claim there came out a lot stronger than I wanted and I concede a lot of what you say. Learning from scratch is probably not computationally intractable in the technical sense. I guess what I wanted to argue is that it appears practically infeasible to learn everything from scratch. (There is a lot of "everything" and not a lot of time to learn it. Any headstart might be strictly necessary and not just a nice-to-have).

(As a side point: your choice of a convnet as the example is interesting. People came up with convnets because fully-connected, randomly initialized networks were not great at image classification and we needed some inductive bias in the form of a locality constraint to learn in a reasonable time. That's the point I wanted to make.)

I guess maybe what you're claiming is: we can't have all three of {learning from scratch, general intelligence, computational tractability}.

Interesting, I haven't thought about it like this before. I do think it could be possible to have all three - but then it's not the brain anymore. As far as I can tell, evolutionary pressures make complete learning from scratch infeasible.

Hey Steve! Thanks for writing this, it was an interesting and useful read! After our discussion in the LW comments, I wanted to get a better understanding of your thinking and this sequence is doing the job. Now I feel I can better engage in a technical discussion.

I can sympathize well with your struggle in section 2.6. A lot of the "big picture" neuroscience is in the stage where it's not even wrong. That being said, I don't think you'll find a lot of neuroscientists who nod along with your line of argument without raising objections here and there (neuroscientists love their trivia). They might be missing the point, but I think that still makes your theory (by definition) controversial. (I think the term "scientific consensus" should be used carefully and very selectively).

In that spirit, there are a few points that I could push back on:

• Cortical uniformity (and by extension canonical microcircuits) are extremely useful concepts for thinking about the brain. But they are not literally 100% accurate. There are a lot of differences between different regions of the cortex, not only in thickness but also in the developmental process (here or here). I don't think anyone except for Jeff Hawkin believes in literal cortical uniformity.
• In section 2.5.4.1 you are being a bit dismissive of biologically-"realistic" implementations of backpropagation. I used to be pretty skeptical too, but some of the recent studies are beginning to make a lot of sense. This one (a collaboration of Deepmind and some of the established neuroscience bigshots) is really quite elegant and offers some great insights on how interneurons and dendritic branches might interact.
• A more theoretical counter: If evolution could initialize certain parts of the cortex so that they are faster "up and running" why wouldn't it? (Just so that we can better understand it? How nice!) From the perspective of evolution, it makes a lot of sense to initialize the cortex with an idea of what an oriented edge is because oriented edges have always been around since the inception of the eye.
Or, in terms of computation theory, learning from scratch is computationally intractable. Strong, informative priors over hypothesis space might just be necessary to learn anything worthwhile at all.

But perhaps I'm missing the point with that nitpicking. I think the broader conceptual question I have is: What does "randomly initialized" even mean in the brain? At what point is the brain initialized? When the neural tube forms? When interneurons begin to migrate to the cortex? When the first synapses are established? When the subplate is gone? When the pruning of excess synapses and the apoptosis of cells is over? When the animal/human is born? When all the senses begin to transmit input? After college graduation?

Perhaps this is the point that the "old-timer" also wanted to make. It doesn't really make sense to separate the "initialization" from the "refinement". They happen at the same time, and whether you put a certain thing into one category or the other is up to individual taste.

All of this being said, I'm very curious to read the next parts of this sequence! :) Perhaps my points don't even affect your core argument about AI Safety.

Thank you very much for pointing it out! Just checked the primary source there it's spelled correctly. But the misspelled version can be found in some newer books that cite the passage. Funny how typos spread...

I'll fix it!

Load More