Director of Research at PAISRI
Most systems eventually face scaling bottlenecks. In fact, unless your system is completely free of coordination, it definitely has bottlenecks even if you haven't scaled large enough to hit them. And since Transformers definitely require some coordination since no matter how large the models are and how much parallelism their hardware supports they still produce a single reduced output, we should expect that there are some scaling limits on Transformers that at some size will prevent them for effectively taking advantage of having a larger network.
Further, you point at this a bit, but most systems also experiencing diminishing returns on performance for additional resources because of these constraints.
Transformers may just be special in that they have yet to start hitting diminishing returns because we haven't yet run up against their coordination bottlenecks, although that doesn't make them too special since we should expect them to still have them lying in wait somewhere, just like they do in every other system that is not coordination free.
You're careful here to talk about transformative AI rather than AGI, and I think that's right. GPT-N does seem like it stands to have transformative effects without necessarily being AGI, and that is quite worrisome. I think many of us expected to find ourselves in a world where AGI was primarily what we had to worry about, and instead we're in a world where "lesser" AI is on track to be powerful enough to dramatically change society. Or at least, so it seems from where we stand, extracting out the trends.
Based on comments/links so far it seems I should revise the names and add a fourth:
Oh, I forgot about emulation approaches, i.e. bootstrap AI by "copying" human brains, which you mention. Thanks!
That's true, but there's a natural and historical relationship here with what was in the past termed "seed AI", even if this is not an approach anyone is actively pursuing, which is the kind of thing I was hoping to point at without using that outmoded term.
Thanks. Your post specifically is pretty helpful because it helps with one of the things that was tripping me up, which is what standard names people call different methods. Your names do a better job of capturing them than mine did.
Actually this post was not especially helpful for my purpose and I should have explained why in advance because I anticipated someone would link it. Although it helpfully lays out a number of proposals people have made, it does more to work out what's going on with those proposals rather than find ways they can be grouped together (except incidentally). I even reread this post before posting this question and it didn't help me improve on the taxonomy I proposed, which I already had in mind as of a few months ago.
My initial thought is that there are at least 3, which I'll give the follow names (with short explanations):
All of these are woefully underspecified, so improved summaries of these approaches that you think accurately explain these approaches also appreciated.
Lately I've been thinking a bit about why programmers have preferences for different programming languages. Like, why is it that I want a language that is (speaking informally)
and other people want the opposite:
And I think a reasonable argument might be a difference in how much different programmers value the creation of a Cartesian boundary in the language, i.e. how much they want to be able to reason about the program as if it existed outside the execution environment. My preference is to move along the "embedded" end of a less-to-more Cartesian-like abstraction dimension of program design, preferring abstractions that, if not less hide the underlying structure, then at least more make it accessible.
This might not be perfect as I'm babbling a bit to try to build out my model of what the difference in preference comes from, but seems kind of interesting and a possible additional example of where this kind of Cartesian-like boundary gets created.
Attempting to approach goal directedness behaviorally is, I expect, going to run into the same problems as trying to infer policy from behaviors only: you can't do it unless you make some normative assumption. This is exactly analogous to the Armstrong's No Free Lunch Theorem for value learning and, to turn it around the other way, we can similarly assign any goal whatsoever to a system based solely on its behavior unless we make some sufficiently strong normative assumption about it.