Each agent was trained on 8 TPUv3's, which cost about $5,000/mo according to a quick google, and which seem to produce 90 TOPS, or about 10^14 operations per second. They say each agent does about 50,000 steps per second, so that means about 2 billion operations per step. Each little game they play lasts 900 steps if I recall correctly, which is about 2 minutes of subjective time they say (I imagine they extrapolated from what happens if you run the game at a speed such that the physics simulation looks normal-speed to us). So that means about 7.5 steps per subjective second, so each agent requires about 15 billion operations per subjective second.

So... 2 billion operations per step suggests that these things are about the size of GPT-2, i.e. about the size of a rat brain? If we care about subjective time, then it seems the human brain maybe uses 10^15 FLOP per subjective second, which is about 5 OOMs more than these agents.

[-]Jsevillamol4y30

Do you mind sharing your guesstimate on number of parameters?

Also, do you have per chance guesstimates on number of parameters / compute of other systems?

Reply

2Daniel Kokotajlo4y

I did, sorry -- I guesstimated FLOP/step and then figured parameters is probably a bit less than 1 OOM less than that. But since this is recurrent maybe it's even less? IDK. My guesstimate is shitty and I'd love to see someone do a better one!

[-]Daniel Kokotajlo4y20

Michael Dennis tells me that population-based training typically sees strong diminishing returns to population size, such that he doubts that there were more than one or two dozen agents in each population/generation. This is consistent with AlphaStar I believe, where the number of agents was something like that IIRC...

Anyhow, suppose 30 agents per generation. Then that's a cost of $5,000/mo x 1.3 months x 30 agents = $195,000 to train the fifth generation of agents. The previous two generations were probably quicker and cheaper. In total the price is prob... (read more)

Reply

5gwern4y

Makes sense given the spinning-top topology of games. These tasks are probably not complex enough to need a lot of distinct agents/populations to traverse the wide part to reach the top where you then need little diversity to converge on value-equivalent models. One observation: you can't run SC2 environments on a TPU, and when you can pack the environment and agents together onto a TPU and batch everything with no copying, you use the hardware closer to its full potential, see the Podracer numbers.

[-]Daniel Kokotajlo4y20

Also for comparison, I think this means these models were about twice as big as AlphaStar. That's interesting.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

10

[ Question ]

How much compute was used to train DeepMind's generally capable agents?

10

1 Answers sorted by
top scoring

Jul 29, 2021

10

[ Question ]

How much compute was used to train DeepMind's generally capable agents?

10

1 Answers sorted by top scoring

Jul 29, 2021

1 Answers sorted by
top scoring