Thank you for this excellent post. Here are some thoughts I had while reading.
I think there's another side to the hard paths hypothesis. We are clearly the first technology-using species to evolve on Earth. However, it's entirely possible that we're not the first species with human-level intelligence. If a species with human level intelligence but no opposable thumbs evolved millions of years ago, they could have died out without leaving any artifacts we'd recognize as signs of intelligence.
Besides our intelligence, humans seem odd in many ways that could plausibly contribute to developing a technological civilization.
Given how well-tuned our biology seems for developing civilization, I think it's plausible that multiple human-level intelligent species arose in Earth's history, but additional bottlenecks prevented them from developing technological civilization. However, most of these bottlenecks wouldn't be an issue for an intelligence generated by simulated evolution. E.g., we could intervene in such a simulation to give low-dexterity species other means of manipulating their environment. Perhaps Earth's evolutionary history actually contains n human-level intelligent species, only one of which developed technology. That implies the true compute required to evolve human-level intelligence is far lower.
I also think the discussion of neuromophic AI and whole brain emulation misses an important possibility that Gwern calls "brain imitation learning". In essence, you record a bunch of data about human brain activity (using EEG, implanted electrodes, etc.), then you train a deep neural network to model the recorded data (similar to how GPT-3 or BERT model text). The idea is that modeling brain activity will cause the deep network to learn some of the brain's neurological algorithms. Then, you train the deep network on some downstream task and hope its learned brain algorithms generalize to the task in question.
I think brain imitation learning is pretty likely to work. We've repeatedly seen in deep learning that knowledge distillation (training a smaller student model to imitate a larger teacher model) is FAR more computationally efficient than trying to train the student model from scratch, while also giving superior performance (Wikipedia, distilling BERT, distilling CLIP). Admittedly, brain activity data is pretty expensive. However, the project that finally builds human-level AI will plausibly cost billions of dollars in compute for training. If brain imitation learning can cut the price by even 10%, it will be worth hundreds of millions in terms of saved compute costs.
What really impressed me were the generalized strategies the agent applied to multiple situations/goals. E.g., "randomly move things around until something works" sounds simple, but learning to contextually apply that strategy
is fairly difficult for deep agents to learn. I think of this work as giving the RL agents a toolbox of strategies that can be flexibly applied to different scenarios.
I suspect that finetuning agents trained in XLand in other physical environments will give good results because the XLand agents already know how to use relatively advanced strategies. Learning to apply the XLand strategies to the new physical environments will probably be easier than starting from scratch in the new environment.