gwern

Comments

[AN #156]: The scaling hypothesis: a plan for building AGI

It's based on those estimates and the systematic biases in such methods & literatures. Just as you know that psychology and medical effects are always overestimated and can be rounded down by 50% to get a more plausible real world estimate, such information-theoretic methods will always overestimate model performance and underestimate human performance, and are based on various idealizations: they use limited genres and writing styles (formal, omitting informal like slang), don't involve extensive human calibration or training like the models get, don't involve any adversarial examples, don't try to test human reasoning by writing up texts made up of logical riddles and puzzles or complicated cause-and-effect scenarios or even things like Winograd Schemas, are time-biased, etc. We've seen a lot of these issues come up in benchmarking, like ImageNet models outside ImageNet despite hitting human parity or superiority. (If we are interested in truly testing 'compression = intelligence', we need texts which stress all capabilities and remove all of those issues.)

So given Shannon's interval's lower end is 0.6, and Grassberger's asymptotic is 0.8 (the footnote 11) and a widespread of upper bounds going down to 1.3 along with extremely dumb fast algorithms hitting 2, I am comfortable with rounding them downish to get estimates of 0.7 bpc being the human performance; and I expect that to, if anything, be still underestimating true human peak performance, so I wouldn't be shocked if it was actually more like 0.6 bpc.

Parameter counts in Machine Learning

Could it be inefficient scaling? Most work not explicitly using scaling laws to plan it seems to generally overestimate in compute per parameter, using too-small models. Anyone want to try to apply Jones 2021 to see if AlphaZero was scaled wrong?

"Decision Transformer" (Tool AIs are secret Agent AIs)

Rewards need not be written in natural language as crudely as "REWARD: +10 UTILONS". Something to think about as you continue to write text online.

And what of the dead? I own that I thought of myself, at times, almost as dead. Are they not locked below ground in chambers smaller than mine was, in their millions of millions? There is no category of human activity in which the dead do not outnumber the living many times over. Most beautiful children are dead. Most soldiers, most cowards. The fairest women and the most learned men – all are dead. Their bodies repose in caskets, in sarcophagi, beneath arches of rude stone, everywhere under the earth. Their spirits haunt our minds, ears pressed to the bones of our foreheads. Who can say how intently they listen as we speak, or for what word?

Agency in Conway’s Game of Life

My immediate impulse is to say that it ought to be possible to create the smiley face, and that it wouldn't be that hard for a good Life hacker to devise it.

I'd imagine it to go something like this. Starting from a Turing machine or simpler, you could program it to place arbitrary 'pixels': either by finding a glider-like construct which terminates at specific distances into a still, so the constructor can crawl along an x/y axis, shooting off the terminating-glider to create stable pixels in a pre-programmed pattern. (If that doesn't exist, then one could use two constructors crawling along the x/y axises, shooting off gliders intended to collide, with the delays properly pre-programmed.) The constructor then terminates in a stable still life; this guarantees perpetual stability of the finished smiley face. If one wants to specify a more dynamic environment for realism, then the constructor can also 'wall off' the face using still blocks. Once that's done, nothing from the outside can possibly affect it, and it's internally stable, so the pattern is then eternal.

gwern's Shortform

2-of-2 escrow: what is the exploding Nash equilibrium? Did it really originate with NashX? I've been looking for the history & real name of this concept for years now and have failed to refind it. Anyone?

Gradations of Inner Alignment Obstacles

I claim that if we're clever enough, we can construct a hypothetical training regime T' which trains the NN to do nearly or exactly the same thing on T, but which injects malign behavior on some different examples. (Someone told me that this is actually an existing area of study; but, I haven't been able to find it yet.)

I assume they're referring to data poisoning backdoor attacks like https://arxiv.org/abs/2010.12563 or https://arxiv.org/abs/1708.06733 or https://arxiv.org/abs/2104.09667

2020 AI Alignment Literature Review and Charity Comparison

That's interesting. I did see YC listed as a major funding source, but given Sam Altman's listed loans/donations, I assumed, because YC has little or nothing to do with Musk, that YC's interest was Altman, Paul Graham, or just YC collectively. I hadn't seen anything at all about YC being used as a cutout for Musk. So assuming the Guardian didn't screw up its understanding of the finances there completely (the media is constantly making mistakes in reporting on finances and charities in particular, but this seems pretty detailed and specific and hard to get wrong), I agree that that confirms Musk did donate money to get OA started and it was a meaningful sum.

But it still does not seem that Musk donated the majority or even plurality of OA donations, much less the $1b constantly quoted (or any large fraction of the $1b collective pledge, per ESRogs).

Load More