jacob_cannell

I have a compute-market startup called vast.ai, I also do research for Orchid (crypto), and I'm working towards a larger plan to save the world. Currently seeking networking, collaborators, and hires - especially top notch cuda/gpu programmers.

My personal blog: https://entersingularity.wordpress.com/

Posts

Sorted by New

Wiki Contributions

Comments

Reply to Eliezer on Biological Anchors

BioAnchors is poorly named, the part you are critiquing should be called GPT-3_Anchors.

A better actual BioAnchor would be based on trying to model/predict how key params like data efficiency and energy efficiency are improving over time, and when they will match/surpass the brain.

GPT-3 could also obviously be improved for example by multi-modal training, active learning, curriculum learning, etc. It's not like it even represents the best of what's possible for a serious AGI attempt today.

Reply to Eliezer on Biological Anchors

It displeases me that this is currently the most upvoted response: I believe you are focusing on EY's weakest rather than strongest points.

My interpretation is that he is saying that Evolution (as the generator of most biological anchors) explores the solution space in a fundamentally different path than human research. So what you have is two paths through a space. The burden of proof for biological anchors thus lies in arguing that there are enough connections/correlations between the two paths to use one in order to predict the other.

It's hardly surprising there are 'two paths through a space' - if you reran either (biological or cultural/technological) evolution with slightly different initial conditions you'd get a different path. However technological evolution is aware of biological evolution and thus strongly correlated to and influenced by it. IE deep learning is in part brain reverse engineering (explicitly in the case of DeepMind, but there are many other examples). The burden proof is thus arguably more opposite of what you claim (EY claims).

In his piece, Yudkowsky is giving arguments that the human research path should lead to more efficient AGIs than evolution, in part due to the ability of humans to have and leverage insights, which the naive optimization process of evolution can't do. He also points to the inefficiency of biology in implementing new (in geological-time) complex solutions.

To the extent EY makes specific testable claims about the inefficiency of biology, those claims are in err - or at least easily contestable.

EY' strongest point is that the Bio Anchors framework puts far too much weight on scaling of existing models (ie transformers) to AGI, rather than modeling improvement in asymptotic scaling itself. GPT-3 and similar model scaling is so obviously inferior to what is probably possible today - let alone what is possible in the near future - that it should be given very little consideration/weight, just as it would be unwise to model AGI based on scaling up 2005 DL tech.

Are minimal circuits deceptive?

This is perhaps not directly related to your argument here, but how is inner alignment failure distinct from generalization failure? If you train network N on dataset D and optimization pressure causes N to internally develop a planning system (mesa-optimizer) M, aren't all questions of whether M is aligned with N's optimization objective just generalization questions?

More specifically if N is sufficiently overcomplete and well regularized, and D is large enough, then N can fully grok the dataset D, resulting in perfect generalization. It's also straightforward as to why this can happen - when N is large enough to contain enough individually regularized sub-model solutions (lottery tickets) that it is approximating a solomonoff style ensemble.

Anyway if N has a measurably low generalization gap on D, then it doesn't seem to matter whether M exists or what it's doing with regard to generalization on D. So is the risk of 'inner alignment failure' involve out of distribution generalization?

Biology-Inspired AGI Timelines: The Trick That Never Works

Which brings me to the second line of very obvious-seeming reasoning that converges upon the same conclusion - that it is in principle possible to build an AGI much more computationally efficient than a human brain - namely that biology is simply not that efficient, and especially when it comes to huge complicated things that it has started doing relatively recently.

 

Biological cells are computers which must copy bits to copy DNA.  So we can ask biology - how much energy do cells use to copy each base pair?  Seems they use just 4 ATP per base pair, or 1 ATP/bit, and thus within an OOM of the 'Landauer bound'.  Which is more impressive if you consider that the typically quoted 'Landauer bound' of kT ln 2 is overly optimistic as it only applies when the error probability is 50% or the computation takes infinity.  Useful computation requires at least somewhat higher speed than inf and reliability higher than none.

Brains have to pump thousands of ions in and out of each stretch of axon and dendrite, in order to restore their ability to fire another fast neural spike.  The result is that the brain's computation is something like half a million times less efficient than the thermodynamic limit for its temperature - so around two millionths as efficient as ATP synthase. 

The fact that cell replication operates at the Landauer bound already suggests a prior that neurons should be efficient.

The Landauer bound at room temp is ~ 0.03 eV.  Given that an electron is something of an obvious minimal unit for an electrical computer, the Landauer bound can be thought of as a 30 mV thermal noise barrier. Digital computers operate roughly 30x that for speed and reliability, but if you look at neuron swing voltages it's clear they are operating only ~3x or so above the noise voltage (optimizing hard for energy efficiency at the expense of speed).

Assuming 1hz * 10^14 synapses / 10 watts = 10^13 synops/watt, or about 10^7 electron charges at landauer voltage.  A synaptic op is at least doing analog signal multiplication, which requires far more energy/charges than a simple binary op - IIRC you need roughly 2^2K carriers and thus erasures to have precision equivalent to K-bit digital, so an 8-bit synaptic op (which IIRC is near where digital/analog mult energy intersects) would be 10^4 or 10^5. I had a relevant ref for this, can't find it now (but think you can derive it from the binomial distribution when std dev/precision is equivalent to 2^-8).

Now most synapses are probably smaller/cheaper than 8-bit equiv, but most of the energy cost involved is in pushing data down irreversible dissipative wires (just as true in the brain as it is in a GPU).  Now add in the additional costs of synaptic adjustment machinery for learning, cell maintenance tax, dendritic computation, etc and it's suddenly not clear at all that the brain is really far from energy efficient.

As further and final bayes evidence, Moore's Law is running out of steam as we run up against the limits of physics (for irreversible computation using irreversible wires) - and at best is just catching up to brain energy efficiency.