I have a compute-market startup called vast.ai, I also do research for Orchid (crypto), and I'm working towards a larger plan to save the world. Currently seeking networking, collaborators, and hires - especially top notch cuda/gpu programmers.

My personal blog: https://entersingularity.wordpress.com/

Wiki Contributions


Nice work. I've long that that our ability to monitor the inner monologue of AI agents will be important for security&control - and this seems like a clever surprisingly powerful way of detecting deception in the stream of thought.

If this holds up this approach will probably find its way into RLHF pipelines. Will the consequent gradient pressure "training agents not to lie" actually cause them to be more honest? Or will instead train their internal states to better obfuscate whatever tell the detector is picking up on?

For example, Judea Pearl published the belief propagation algorithm in 1982. Why hadn’t someone already published it in 1962? Or 1922?

Belief propagation is the kind of thing that most people wouldn't work on in an age before computers. It would be difficult to evaluate/test, but more importantly wouldn't have much hope for application. Seems to me it arrived at a pretty normal time in our world.

For example, people have known for decades that flexible hierarchical planning is very important in humans but no one can get it to really work well in AI, especially in a reinforcement learning context.

What do you think of diffusion planning?

How long have you held your LLM plateau model and how well did it predict GPT4 scaling? How much did you update on GPT4? What does your model predict for (a hypothetical) GPT5?

My answers are basically that I predicted back in 2015 that something not much different than NNs of the time (GPT1 was published a bit after) could scale all the way with sufficient compute, and the main key missing ingredient of 2015 NNs was flexible context/input dependent information routing, which vanilla FF NNs lack. Transformers arrived in 2017[1] with that key flexible routing I predicted (and furthermore use all previous neural activations as a memory store) which emulates a key brain feature in fast weight plasticity.

GPT4 was something of an update in that they simultaneously scaled up the compute somewhat more than I expected but applied it more slowly - taking longer to train/tune/iterate etc. Also the scaling to downstream tasks was somewhat better than I expected.

All that being said, the transformer arch on GPUs only strongly accelerates training (consolidation/crystallization of past information), not inference (generation of new experience), which explains much of what GPT4 lacks vs a full AGI (although there are other differences that may be important, that is probably primary, but further details are probably not best discussed in public).

  1. Attention is All you Need ↩︎

I disagree with “uncontroversial”. Just off the top of my head, people who I’m pretty sure would disagree with your “uncontroversial” claim include

Uncontroversial was perhaps a bit tongue-in-cheek, but that claim is specifically about a narrow correspondence between LLMs and linguistic cortex, not about LLMs and the entire brain or the entire cortex.

And this claim should now be uncontroversial. The neuroscience experiments have been done, and linguistic cortex computes something similar to what LLMs compute, and almost certainly uses a similar predictive training objective. It obviously implements those computations in a completely different way on very different hardware, but they are mostly the same computations nonetheless - because the task itself determines the solution.

Examples from recent neurosci literature:

From "Brains and algorithms partially converge in natural language processing":

Deep learning algorithms trained to predict masked words from large amount of text have recently been shown to generate activations similar to those of the human brain. However, what drives this similarity remains currently unknown. Here, we systematically compare a variety of deep language models to identify the computational principles that lead them to generate brain-like representations of sentences

From "The neural architecture of language: Integrative modeling converges on predictive processing":

Here, we report a first step toward addressing this gap by connecting recent artificial neural networks from machine learning to human recordings during language processing. We find that the most powerful models predict neural and behavioral responses across different datasets up to noise levels.

From "Correspondence between the layered structure of deep language models and temporal structure of natural language processing in the human brain"

We found a striking correspondence between the layer-by-layer sequence of embeddings from GPT2-XL and the temporal sequence of neural activity in language areas. In addition, we found evidence for the gradual accumulation of recurrent information along the linguistic processing hierarchy. However, we also noticed additional neural processes that took place in the brain, but not in DLMs, during the processing of surprising (unpredictable) words. These findings point to a connection between language processing in humans and DLMs where the layer-by-layer accumulation of contextual information in DLM embeddings matches the temporal dynamics of neural activity in high-order language areas.

Then “my model of you” would reply that GPT-3 is much smaller / simpler than the brain, and that this difference is the very important secret sauce of human intelligence, and the “thinking per FLOP” comparison should not be brain-vs-GPT-3 but brain-vs-super-scaled-up-GPT-N, and in that case the brain would crush it.

Scaling up GPT-3 by itself is like scaling up linguistic cortex by itself, and doesn't lead to AGI any more/less than that would (pretty straightforward consequence of the LLM <-> linguistic_cortex (mostly) functional equivalence).

In the OP (Section 3.3.1) I talk about why I don’t buy that—I don’t think it’s the case that the brain gets dramatically more “bang for its buck” / “thinking per FLOP” than GPT-3. In fact, it seems to me to be the other way around.

The comparison should between GPT-3 and linguistic-cortex, not the whole brain. For inference the linguistic cortex uses many orders of magnitude less energy to perform the same task. For training it uses many orders of magnitude less energy to reach the same capability, and several OOM less data. In terms of flops-equivalent it's perhaps 1e22 sparse flops for training linguistic cortex (1e13 flops * 1e9 seconds) vs 3e23 flops for training GPT-3. So fairly close, but the brain is probably trading some compute efficiency for data efficiency.

He writes that the human brain has “1e13-1e15 spikes through synapses per second (1e14-1e15 synapses × 0.1-1 spikes per second)”. I think Joe was being overly conservative, and I feel comfortable editing this to “1e13-1e14 spikes through synapses per second”, for reasons in this footnote→[9].


I agree that 1e14 synaptic spikes/second is the better median estimate, but those are highly sparse ops. 

So when you say:

So I feel like 1e14 FLOP/s is a very conservative upper bound on compute requirements for AGI. And conveniently for my narrative, that number is about the same as the 8.3e13 FLOP/s that one can perform on the RTX 4090 retail gaming GPU that I mentioned in the intro.

You are missing some foundational differences in how von neumann arch machines (GPUs) run neural circuits vs how neuromorphic hardware (like the brain) runs neural circuits.

The 4090 can hit around 1e14 - even up to 1e15 - flops/s, but only for dense matrix multiplication.  The flops required to run a brain model using that dense matrix hardware are more like 1e17 flops/s, not 1e14 flops/s.  The 1e14 synapses are at least 10x locally sparse in the cortex, so dense emulation requires 1e15 synapses (mostly zeroes) running at 100hz.  The cerebellum is actually even more expensive to simulate .. because of the more extreme connection sparsity there.

But that isn't the only performance issue.  The GPU only runs matrix matrix multiplication, not the more general vector matrix multiplication.  So in that sense the dense flop perf is useless, and the perf would instead be RAM bandwidth limited and require 100 4090's to run a single 1e14 synapse model - as it requires about 1B of bandwidth per flop - so 1e14 bytes/s vs the 4090's 1e12 bytes/s.

Your reply seems to be "but the brain isn't storing 1e14 bytes of information", but as other comments point out that has little to do with the neural circuit size.

The true fundamental information capacity of the brain is probably much smaller than 1e14 bytes, but that has nothing to do with the size of an actually *efficient* circuit, because efficient circuits (efficient for runtime compute, energy etc) are never also efficient in terms of information compression.

This is a general computational principle, with many specific examples: compressed neural frequency encodings of 3D scenes (NERFs) which access/use all network parameters to decode a single point O(N) are enormously less computationally efficient (runtime throughput, latency, etc) than maximally sparse representations (using trees, hashtables etc) which approach O(log(N)) or O(C), but the sparse representations are enormously less compressed/compact.  These tradeoffs are foundational and unavoidable.

We also know that in many cases the brain and some ANN are actually computing basically the same thing in the same way (LLMs and linguistic cortex), and it's now obvious and uncontroversial that the brain is using the sparser but larger version of the same circuit, whereas the LLM ANN is using the dense version which is more compact but less energy/compute efficient (as it uses/accesses all params all the time).

One of my disagreements with your U,V,P,W,A model is that I think V & W are randomly-initialized in animals. Or maybe I’m misunderstanding what you mean by “brains also can import varying degrees of prior knowledge into other components”.

I think we agree the cortex/cerebellum are randomly initialized, along with probably most of the hippocampus, BG, perhaps amagdyla? and a few others. But those don't map cleanly to U, W/P, and V/A.

For example, I think most newborn behaviors are purely driven by the brainstem, which is doing things of its own accord without any learning and without any cortex involvement.

Of course - and that is just innate unlearned knowledge in V/A. V/A (value and action) generally go together, because any motor/action skills need pairing with value estimates so the BG can arbitrate (de-conflict) action selection.

The moral is: I claim that figuring out what’s empowering is not a “local” / “generic” / “universal” calculation. If I do X in the morning, it is unknowable whether that was an empowering or disempowering action, in the absence of information about where I’m likely to find myself in in the afternoon. And maybe I can make an intelligent guess at those, but I’m not omniscient. If I were a newborn, I wouldn’t even be able to guess.

Empowerment and value-of-information (curiosity) estimates are always relative to current knowledge (contextual to the current wiring and state of W/P and V/A). Doing X in the morning generally will have variable optionality value depending on the contextual state, goals/plans, location, etc. I'm not sure why you seem to think that I think of optionality-empowerment estimates as requiring anything resembling omniscience.

The newborns VoI and optionality value estimates will be completely different and focused on things like controlling flailing limbs and making sounds, moving the head, etc.

But I don’t know how the baby cats, bats, and humans are supposed to figure that out, via some “generic” empowerment calculation. Arm-flapping is equally immediately useless for both newborn bats and newborn humans, but newborn humans never flap their arms and newborn bats do constantly.

There's nothing to 'figure out' - it just works. If you're familiar with the approximate optionality-empowerment literature, it should be fairly obvious that a generic agent maximizing optionality, will end up flapping it's wing-arms when controlling a bat body, flailing limbs around in a newborn human body, balancing pendulums, learning to walk, etc. I've already linked all this - but maximizing optionality automatically learns all motor skills - even up to bipedal walking.

So yeah, it would be simple and elegant to say “the baby brain is presented with a bunch of knobs and levers and gradually discovers all the affordances of a human body”. But I don’t think that fits the data, e.g. the lack of human newborn arm-flapping experiments in comparison to bats.

Human babies absolutely do the equivalent experiments - most of the difference is simply due to large differences in the arm structure. The bat's long extensible arms are built to flap, the human infants' short stubby arms are built to flail.

Also keep in mind that efficient optionality is approximated/estimated from a sampling of likely actions in the current V/A set, so it naturally and automatically takes advantage of any prior knowledge there. Perhaps the bat does have prior wiring in V/A that proposes&generates simple flapping that can be improved

Instead, I think baby humans have an innate drive to stand up, an innate drive to walk, an innate drive to grasp, and probably a few other things like that. I think they already want to do those things even before they have evidence (or other rational basis to believe) that doing so is empowering.

This just doesn't fit the data at all. Humans clearly learn to stand and walk. They may have some innate bias in V/U which makes that subgoal more attractive, but that is intrinsically more complex addition to the basic generic underlying optionality control drive.

I claim that this also fits better into a theory where (1) the layout of motor cortex is relatively consistent between different people (in the absence of brain damage),

We've already been over that - consistent layout is not strong evidence of innate wiring. A generic learning system will learn similar solutions given similar inputs & objectives.

(2) decorticate rats can move around in more-or-less species-typical ways,

The general lesson from the decortication experiments is that smaller brain mammals rely on (their relatively smaller) cortex less. Rats/rabbits can do much without the cortex and have many motor skills available at birth. Cats/dogs need to learn a bit more, and then primates - especially larger ones - need to learn much more and rely on the cortex heavily. This is extreme in humans, to the point where there is very little innate motor ability left, and the cortex does almost everything.

(3) there’s strong evolutionary pressure to learn motor control fast and we know that reward-shaping is certainly helpful for that,

It takes humans longer than an entire rat lifespan just to learn to walk. Hardly fast.

(4) and that there’s stuff in the brainstem that can do this kind of reward-shaping,

Sure, but there is hardly room in the brainstem to reward-shape for the different things humans can learn to do.

Universal capability requires universal learning.

(5) lots of animals can get around reasonably well within a remarkably short time after birth,

Not humans.

(6) stimulating a certain part of the brain can create “an urge to move your arm” etc. which is independent from executing the actual motion,

Unless that is true for infants, it's just learned V components. I doubt infants have an urge to move the arm in a coordinated way, vs lower level muscle 'urges', but even if they did that's just some prior knowledge in V.

(If you put a novel and useful motor affordance on a baby human—some funny grasper on their hand or something—I’m not denying that they would eventually figure out how to start using it, thanks to more generic things like curiosity,

We know that humans can learn to see through their tongue - and this does not take much longer than an infant learning to see through its eyes.

I think we both agree that sensory cortex uses a pretty generic universal learning algorithm (driven by self supervised predictive learning). I just also happen to believe the same applies to motor and higher cortex (driven by some mix of VoI, optionality control, etc).

I think we’re giving baby animals too much credit if we expect them to be thinking to themselves “gee when I grow up I might need to be good at fighting so I should practice right now instead of sitting on the comfy couch”. I claim that there isn’t any learning signal or local generic empowerment calculation that would form the basis for that

Comments like these suggest you don't have the same model of optionality-empowerment as I do. When the cat was pinned down by the dog in the past, it's planning subsystem computed low value for that state - mostly based on lack of optionality - and subsequently the V system internalizes this as low value for that state and states leading towards it. Afterwards when entering a room and seeing the dog on the other side, the W/P planning system quickly evaluates a few options like: (run into the center and jump up onto the table), (run into the center and jump onto the couch), (run to the right and hide behind the couch), etc - and subplan/action (run into the center ..) gets selected in part because of higher optionality. It's just an intrinsic component of how the planning system chooses options on even short timescales, and chains recursively through training V/A.

I'll start with a basic model of intelligence which is hopefully general enough to cover animals, humans, AGI, etc. You have a model-based agent with a predictive world model W learned primarily through self-supervised predictive learning (ie learning to predict the next 'token' for a variety of tokens), a planning/navigation subsystem P which uses W to approximately predict sample important trajectories according to some utility function U, a value function V which computes the immediate net expected discounted future utility of actions from current state (including internal actions), and then some action function A which just samples high value actions based on V. The function of the planning subsystem P is then to train/update V.

The utility function U obviously needs some innate bootstrapping, but brains also can import varying degrees of prior knowledge into other components - and most obviously into V, the value function. Many animals need key functionality 'out of the box', which you can get by starting with a useful prior on V/A. The benefit for innate prior knowledge in V/A diminishes as brains scale up in net training compute (size * training time), so that humans - with net training compute ~1e25 ops vs ~1e21 ops for a cat - rely far more on learned knowledge for V/A rather than prior/innate knowledge.

So now to translate into your 3 levels:

A.): Innate drives: Innate prior knowledge in U and in V/A.

B.): Learned from experience and subsumed into system 1: using W/P to train V/A.

C.): System 2 style reasoning: zero shot reasoning from W/P.

(1) Evidence from cases where we can rule out (C), e.g. sufficiently simple and/or young humans/animals

So your A.) - innate drives - corresponds to U or the initial state of V/A at birth. I agree the example of newborn rodents avoiding birdlike shadows is probably mostly innate V/A - value/action function prior knowledge.

(2) Evidence from sufficiently distant consequences that we can rule out (B) Example: Many animals will play-fight as children. This has a benefit (presumably) of eventually making the animals better at actual fighting as adults. But the animal can’t learn about that benefit via trial-and-error—the benefit won’t happen until perhaps years in the future.

Sufficiently distant consequences is exactly what empowerment is for, as the universal approximator of long term consequences. Indeed the animals can't learn about that long term benefit through trial-and-error, but that isn't how most learning operates. Learning is mostly driven by the planning system 1 - M/P - which drives updates to V/A based on both current learned V and U - and U by default is primarily estimating empowerment and value of information as universal proxies.

The animals play-fighting is something I have witnessed and studied recently. We have a young dog and a young cat who organically have learned to play several 'games'. The main game is a simple chase where the larger dog tries to tackle the cat. The cat tries to run/jump to safety. If the dog succeeds in catching the cat, the dog will tackle constrain it on the ground, teasing it for a while. We - the human parents - often will interrupt the game at this point and occasionally punish the dog if it plays too rough and the cat complains. In the earliest phases the cat was about as likely to chase and attack the dog as the other way around, but over time learned it would near always lose wrestling matches and up in a disempowered state.

There is another type of ambush game the cat will play in situations where it can 'attack' the dog from safety or in range to escape to safety, and then other types of less rough play fighting they do close to us.

So I suspect that some amount of play fighting skill knowledge is prior instinctual, but much of it is also learned. The dog and cat both separately enjoy catching/chasing balls or small objects, the cat play fights and 'attacks' other toys, etc. So early on in their interactions they had these skills available, but those alone are not sufficient to explain the game(s) they play together.

The chase game is well explained by empowerment drive: the cat has learned that allowing the dog to chase it down leads to an intrinsically undesirable disempowered state. This is a much better fit for the data and also has much lower intrinsic complexity than a bunch of innate drives for every specific disempowered situation, vs a general empowerment drive. It's also empowering for the dog to control and disempower the cat to some extent. So much of innate hunting skill drives seem like just variations and/or mild tweaks to empowerment.

The only part of this that requires a more specific explanation is perhaps the safety aspect of play fighting: each animal is always pulling punches to varying degrees, the cat isn't using fully extended claws, neither is biting with full force, etc. That is probably the animal equivalent of empathy/altruism.

Status—I’m not sure whether Jacob is suggesting that human social status related behaviors are explained by (B) or (C) or both. But anyway I think 1,2,3,4 all push towards an (A)-type explanation for human social status behaviors. I think I would especially start with 3 (heritability)—if having high social status is generally useful for achieving a wide variety of goals, and that were the entire explanation for why people care about it, then it wouldn’t really make sense that some people care much more about status than others do, particularly in a way that (I’m pretty sure) statistically depends on their genes

Status is almost all learned B: system 2 W/P planning driving system 1 V/A updates.

Earlier I said - and I don't see your reply yet, so i'll repeat it here:

Infants don't even know how to control their own limbs, but they automatically learn through a powerful general empowerment learning mechanism. That same general learning signal absolutely does not - and can not - discriminate between hidden variables representing limb poses (which it seeks to control) and hidden variables representing beliefs in other humans minds (which determine constraints on the child's behavior). It simply seeks to control all such important hidden variables.

Social status drive emerges naturally from empowerment, which children acquire by learning cultural theory of mind and folk game theory through learning to communicate with and through their parents. Children quickly learn that hidden variables in their parents have huge effect on their environment and thus try to learn how to control those variables.

It's important to emphasize that this is all subconscious and subsumed into the value function, it's not something you are consciously aware of.

I don't see how heritability tells us much about how innate social status is. Genes can control many hyperparms which can directly or indireclty influence the later learned social status drive. One obvious example is just the relevant weightings of value-of-information (curiosity) vs optionality-empowerment and other innate components of U at different points in time (development periods). I think this is part of the explanation for children who are highly curious about the world and less concerned about social status vs the converse.

Fun—Jacob writes “Fun is also probably an emergent consequence of value-of-information and optionality” which I take to be a claim that “fun” is (B) or (C), not (A). But I think it’s (A).

Fun is complex and general/vague - it can be used to describe almost anything we derive pleasure from in your A.) or B.) categories.

Not if exploration is on-policy, or if the agent reflectively models and affects its training process. In either case, the agent can zero out its exploration probability of the maze, so as to avoid predictable value drift towards blueberries. The agent would correctly model that if it attained the blueberry, that experience would enter its data distribution and the agent would be updated so as to navigate towards blueberries instead of raspberries, which leads to fewer raspberries, which means the agent doesn't navigate to that future.

If this agent is smart/reflective enough to model/predict the future effects of its RL updates, then you already are assuming a model-based agent which will then predict higher future reward by going for the blueberry. You seem to be assuming the bizarre combination of model-based predictive capability for future reward gradient updates but not future reward itself. Any sensible model-based agent would go for the blueberry absent some other considerations.

This is not just purely speculation in the sense that you can run efficient zero in scenarios like this, and I bet it goes for the blueberry.

Your mental model seems to assume pure model-free RL trained to the point that it gains some specific model-based predictive planning capabilities without using those same capabilities to get greater reward.

Humans often intentionally avoid some high reward 'blueberry' analogs like drugs using something like the process you describe here, but hedonic reward is only one component of the human utility function, and our long term planning instead optimizes more for empowerment - which is usually in conflict with short term hedonic reward.

This has been discussed before. Your example of not being a verbal thinker is not directly relevant because 1.) inner monologue need not be strictly verbal, 2.) we need only a few examples of strong human thinkers with verbal inner monologues to show that isn't an efficiency disadvantage - so even if your brain type is less monitorable we are not confined to that design.

I also do not believe your central claim - in that based on my knowledge of neuroscience - disabling the brain modules responsible for your inner monologue will not only disable your capacity for speech, it will also seriously impede your cognition and render you largely incapable of executing complex long term plans.

Starting with a brain-like AGI, there are several obvious low-cost routes to dramatically improve automated cognitive inspectability. A key insight is that there are clear levels of abstraction in the brain (as predicted by the need to compress sensory streams for efficient bayesian prediction) and the inner monologue is at the top of the abstraction hierarchy, which maximizes information utility per bit. At the bottom of the abstraction hierarchy would be something like V1, which would be mostly useless to monitor (minimal value per bit).

Roughly speaking, I think that cognitive interpretability approaches are doomed, at least in the modern paradigm, because we're not building minds but rather training minds, and we have very little grasp of their internal thinking,

A brain-like AGI - modeled after our one working example of efficient general intelligence - would naturally have an interpretable inner monologue we could monitor. There's good reasons to suspect that DL based general intelligence will end up with something similar simply due to the convergent optimization pressure to communicate complex thought vectors to/from human brains through a low-bitrate channel.

"Well, it never killed all humans in the toy environments we trained it in (at least, not after the first few sandboxed incidents, after which we figured out how to train blatantly adversarial-looking behavior out of it)" doesn't give me much confidence. If you're smart enough to design nanotech that can melt all GPUs or whatever (disclaimer: this is a toy example of a pivotal act, and I think better pivotal-act options than this exist) then you're probably smart enough to figure out when you're playing for keeps, and all AGIs have an incentive not to kill all "operators" in the toy games once they start to realize they're in toy games.

Intelligence potential of architecture != intelligence of trained system

The intelligence of a trained system depends on the architectural prior, the training data, and the compute/capacity. Take even an optimally powerful architectural prior - one that would develop into a superintelligence if trained on the internet with reasonable compute - and it would still only be nearly as dumb as a rock if trained solely in atari pong. Somewhere in between the complexity of pong and our reality exists a multi-agent historical sim capable of safely confining a superintelligent architecture and iterating on altruism/alignment safely. So by the time that results in a system that is "smart enough to design nanotech", it should already be at least as safe as humans. There of course ways that strategy fails, but they don't fail because 'smartness' strictly entails unconfineability - which becomes more clear when you taboo 'smartness' and replace it with a slightly more detailed model of intelligence.

Load More