This story was originally posted as a response to this thread.
It might help to imagine a hard takeoff scenario using only known sorts of NN & scaling effects...
In A.D. 20XX. Work was beginning. "How are you gentlemen !!"... (Work. Work never changes; work is always hell.)
Specifically, a MoogleBook researcher has gotten a pull request from Reviewer #2 on his new paper in evolutionary search in auto-ML, for error bars on the auto-ML hyperparameter sensitivity like larger batch sizes, because more can be different and there's high variance in the old runs with a few anomalously high performance values. ("Really? Really? That's what you're worried about?") He can't see why worry, and wonders what sins he committed to deserve this asshole Chinese (given the Engrish) reviewer, as he wearily kicks off yet another HQU experiment...
I found this story tough to follow on a technical level, despite being familiar with most of the ideas it cites (and having read many of the papers before).
Like, I've read and re-read the first few sections a number of times, and I still can't come up with a mental model of HXU's structure that fits all of the described facts. By "HXU's structure" I mean things like:
Since I can't answer these questions in a way that makes sense, I also don't know how to read the various lines that describe "HXU" doing something, or attribute mental states to "HXU."
For instance, the thing in "1 Day" that has a world model -- is this a single rollout of the Meta-RNN-ish thing, which developed its world model as it chewed its way along a task sequence? In which case, the world model(s) are being continually discarded (!) at the end of every such rollout and then built anew from scratch in the next one? Are we doing the search problem of finding-a-world-model inside of a second search problem?
Where the outer search is (maybe?) happening through ES, which is stupid and needs gajillions of inner rollouts to get anywhere, even on trivial problems?
If the smart-thing-that-copies-itself called "HXU" is a single such rollout, and the 20XX computers can afford gajillions of such rollouts, then what are the slightly less meta 20XX models like, and why haven't they already eaten the world?
(Less important, but still jumped out at me: in "1 Day," why is HXU doing "grokking" [i.e. overfitting before the phase transition], as opposed to some other kind of discontinuous capability gain that doesn't involve overfitting? Like, sure, I suppose it could be grokking here, but this is another one of those paper references that doesn't seem to be "doing work" to motivate story events.)
I dunno, maybe I'm reading the whole thing more closely or literally than it's intended? But I imagine you intend the ML references to be taken somewhat more "closely" than the namedrops in your average SF novel, given the prefatory material:
And I'm not alleging that it is "just namedropping like your average SF novel." I'm taking the references seriously. But, when I try to view the references as load-bearing pieces in a structure, I can't make out what that structure is supposed to be.
Curated. I like fiction. I like that this story is fiction. I hope that all stories even at all vaguely like this one remain fiction.