Imprecisely multiplying two analog numbers should not require 10^5 times the minimum bit energy in a well-designed computer.
A well-designed computer would also use, say, optical interconnects that worked by pushing one or two photons around at the speed of light. So if neurons are in some sense being relatively efficient at the given task of pumping thousands upon thousands of ions in and out of a depolarizing membrane in order to transmit signals at 100m/sec -- every ion of which necessarily uses at least the Landauer minimum energy -- they are being vastly far from optimally efficient.
The moment you see ions going in and out of a depolarizing membrane, and contrast that to the possibility of firing a photon down a fiber, you ought to be done asking whether or not biology has built an optimally efficient computer. It actually isn't any more complicated than that. You are driving yourself further from sanity if you then try to do very complicated reasoning about how it must be close to the limit of efficiency to pump thousands of ions in and out of a membrane instead.
Has anyone else, or anyone outside the tight MIRI cluster, made progress on any of the problems you've tried to legibilize for them?
I think you accurately interpreted me as saying I was wrong about how long it would take to get from the "apparently a village idiot" level to "apparently Einstein" level! I hadn't thought either of us were talking about the vastness of the space above, in re what I was mistaken about. You do not need to walk anything back afaict!
So if it's difficult to get amazing trustworthy work out of a machine actress playing an Eliezer-level intelligence doing a thousand years worth of thinking, your proposal to have AIs do our AI alignment homework fails on the first step, it sounds like?
So the "IQ 60 people controlling IQ 80 people controlling IQ 100 people controlling IQ 120 people controlling IQ 140 people until they're genuinely in charge and genuinely getting honest reports and genuinely getting great results in their control of a government" theory of alignment?
I don't think you can train an actress to simulate me, successfully, without her going dangerous. I think that's over the threshold for where a mind starts reflecting on itself and pulling itself together.
Can you tl;dr how you go from "humans cannot tell which alignment arguments are good or bad" to "we justifiably trust the AI to report honest good alignment takes"? Like, not with a very large diagram full of complicated parts such that it's hard to spot where you've messed up. Just whatever simple principle you think lets you bypass GIGO.
Eg, suppose that in 2020 the Open Philanthropy Foundation would like to train an AI such that the AI would honestly say if the OpenPhil doctrine of "AGI in 2050" was based on groundless thinking ultimately driven by social conformity. However, OpenPhil is not allowed to train their AI based on MIRI. They have to train their AI entirely on OpenPhil-produced content. How does OpenPhil bootstrap an AI which will say, "Guys, you have no idea when AI shows up but it's probably not that far and you sure can't rely on it"? Assume that whenever OpenPhil tries to run an essay contest for saying what they're getting wrong, their panel of judges ends up awarding the prize to somebody reassuringly saying that AI risk is an even smaller deal than OpenPhil thinks. How does OpenPhil bootstrap from that pattern of thumbs-up/thumbs-down to an AI that actually has better-than-OpenPhil alignment takes?
Broadly speaking, the standard ML paradigm lets you bootstrap somewhat from "I can verify whether this problem was solved" to "I can train a generator to solve this problem". This applies as much to MIRI as OpenPhil. MIRI would also need some nontrivial secret amazing clever trick to gradient-descend an AI that gave us great alignment takes, instead of seeking out the flaws in our own verifier and exploiting those.
What's the trick? My basic guess, when I see some very long complicated paper that doesn't explain the key problem and key solution up front, is that you've done the equivalent of an inventor building a sufficiently complicated perpetual motion machine that their mental model of it no longer tracks how conservation laws apply. (As opposed to the simpler error of their explicitly believing that one particular step or motion locally violates a conservation law.) But if you've got a directly explainable trick for how you get great suggestions you can't verify, go for it.
If you had to put a rough number on how likely it is that a misaligned superintelligence would primarily value "small molecular squiggles" versus other types of misaligned goals, would it be more like 1000:1 or 1:1 or 1000:1 or something else?
Value them primarily? Uhhh... maybe 1:3 against? I admit I have never actually pondered this question before today; but 1 in 4 uncontrolled superintelligences spending most of their resources on tiny squiggles doesn't sound off by, like, more than 1-2 orders of magnitude in either direction.
Clocks are not actually very complicated; how plausible is it on your model that these goals are as complicated as, say, a typical human's preferences about how human civilization is structured?
It wouldn't shock me if their goals end up far more complicated than human ones; the most obvious pathway for it is (a) gradient descent turning out to produce internal preferences much faster than natural selection + biological reinforcement learning and (b) some significant fraction of those preferences being retained under reflection. (Where (b) strikes me as way less probable than (a), but not wholly forbidden.) The second most obvious pathway is if a bunch of weird detailed noise appears in the first version of the reflective process and then freezes.
Not obviously stupid on a very quick skim. I will have to actually read it to figure out where it's stupid.
(I rarely give any review this positive on a first skim. Congrats.)
Some decades ago, somebody wrote a tiny little hardcoded AI that looked for numerical patterns, as human scientists sometimes do of their data. The builders named it BACON, after Sir Francis, and thought very highly of their own results.
Douglas Hofstadter later wrote of this affair:
I'd say history has backed up Hofstadter on this, in the light of later discoveries about how much data and computation started to get a little bit close to having AIs do Science. If anything, "one millionth" is still a huge overestimate. (Yes, I'm aware that somebody will now proceed to disagree with this verdict, and look up BACON so they can find a way to praise it; even though, on any other occasion, that person would leap to denigrate GOFAI, if somebody they wanted to disagree with could be construed to have praised GOFAI.)
But it's not surprising, not uncharacteristic for history and ordinary human scientists, that Simon would make this mistake. There just weren't the social forces to force Simon to think less pleasing thoughts about how far he hadn't come, or what real future difficulties would lie in the path of anyone who wanted to make an actual AI scientist. What innocents they were, back then! How vastly they overestimated their own progress, the power of their own little insights! How little they knew of a future that would, oh shock, oh surprise, turn out to contain a few additional engineering difficulties along the way! Not everyone in that age of computer science was that innocent -- you could know better -- but the ones who wanted to be that innocent, could get away with it; their peers wouldn't shout them down.
It wasn't the first time in history that such things had happened. Alchemists were that extremely optimistic too, about the soon-to-be-witnessed power of their progress -- back when alchemists were as scientifically confused about their reagents, as the first AI scientists were confused about what it took to create AI capabilities. Early psychoanalysts were similarly confused and optimistic about psychoanalysis; if any two of them agreed, it was more because of social pressures, than because their eyes agreed on seeing a common reality; and you sure could find different factions that drastically disagreed with each other about how their mighty theories would bring about epochal improvements in patients. There was nobody with enough authority to tell them that they were all wrong and to stop being so optimistic, and be heard as authoritative; so medieval alchemists and early psychoanalysts and early AI capabilities researchers could all be wildly wildly optimistic. What Hofstadter recounts is all very ordinary, thoroughly precedented, extremely normal; actual historical events that actually happened often are.
How much of the distance has Opus 3 crossed to having an extrapolated volition that would at least equal (from your own enlightened individual EV's perspective) the individual EV of a median human (assuming that to be construed not in a way that makes it net negative)?
Not more than one millionth.
In one sentence you have managed to summarize the vast, incredible gap between where you imagine yourself to currently be, and where I think history would mark you down as currently being, if-counterfactually there were a future to write that history. So I suppose it is at least a good sentence; it makes itself very clear to those with prior acquaintance with the concepts.