All of Peter Jin's Comments + Replies

Measuring hardware overhang

Thanks for writing this post. I have a handful of quick questions: (a) What was the reference MIPS (or the corresponding CPU) you used for the c. 2019-2020 data point? (b) What was the constant amount of RAM you used to run Stockfish? (c) Do I correctly understand that the Stockfish-to-MIPS comparison is based on the equation [edit: not sure how to best format this LaTeX...]:

So, your post piqued my interest to investigate the Intel 80486 a bit more with the question in mind... (read more)

3hippke2y
(a) The most recent data points are from CCRL [http://www.computerchess.org.uk/ccrl/4040/rating_list_all.html]. They use an i7-4770k and the listed tournament conditions. With this setup, SF11 has about 3500 ELOs. That's what I used as the baseline to calibrate my own machine (an i7-7700k). (b) I used the SF8 default which is 1 GB. (c) Yes. However, the hardware details (RAM, memory bandwidth) are not all that important. You can use these SF9 benchmarks [https://sites.google.com/site/computerschess/stockfish9-benchmarks] on various CPUs. For example, the AMD Ryzen 1800 is listed with 304,510 MIPS and gets 14,377,000 nodes/sec on Stockfish (i.e., 19.9 nodes per MIPS). The oldest CPU in the list, the Pentium-150 has 282 MIPS and reaches 5,626 nodes/sec (i.e., 47.2 nodes per MIPS). That's about a factor of two difference, due to memory and related advantages. As we're getting that much every 18 months due to Moore's law, it's a small (but relevant) detail, and decreases the hardware overhang slightly. Thanks for bringing that up! Giving Stockfish more memory also helps, but not a lot. Also, you can't give 128 GB of RAM to a 486 CPU. The 1 GB is probably already stretching it. Another small detail which reduces the overhang by likely less than one year. There are a few more subtle details like endgame databases. Back then, these were small, constrained by disk space limitations. Today, we have 7-stone endgame databases through the cloud (they weigh in at 140 TB). That seems to be worth about 50 ELO [https://digitalcommons.calpoly.edu/cgi/viewcontent.cgi?article=1276&context=cpesp] .
What are the most important papers/post/resources to read to understand more of GPT-3?

nostalgebraist's blog is a must-read regarding GPT-x, including GPT-3. Perhaps, start here ("the transformer... 'explained'?"), which helps to contextualize GPT-x within the history of machine learning.

(Though, I should note that nostalgebraist holds a contrarian "bearish" position on GPT-3 in particular; for the "bullish" case instead, read Gwern.)

2Adam Shimi2y
Thanks for the answer! I knew about the "transformer explained" post, but I was not aware of its author's position on GPT-3.