It has become common on LW to refer to "giant inscrutable matrices" as a problem with modern deep-learning systems. To clarify: deep learning models are trained by creating giant blocks of random numbers -- blocks with dimensions like 4096 x 512 x 1024 -- and incrementally adjusting the values of...
On March 29th, DeepMind published a paper, "Training Compute-Optimal Large Language Models", that shows that essentially everyone -- OpenAI, DeepMind, Microsoft, etc. -- has been training large language models with a deeply suboptimal use of compute. Following the new scaling laws that they propose for the optimal use of compute,...
The goal of this essay is to help you understand EfficientZero, a reinforcement learning agent that obtains better-than-human median performance on a set of 26 Atari games after just two hours of real-time experience playing each game. Specifically, it gets 116% of human median performance on the data-limited Atari 100k...
Epistemic status: Pretty sure about core, not about edges A while ago, I noticed a possible bias in how I evaluated reinforcement learning agents. It tended to cause me to revise my estimation of their intelligence downwards, after I viewed a video of them in action. I've seen other people...
Intro One of DeepMind's latest papers, Open-Ended Learning Leads to Generally Capable Agents, explains how DeepMind produced agents that can successfully play games as complex as hide-and-seek or capture-the-flag without even having trained on or seen these games before. As far as I know, this is an entirely unprecedented level...