All of hippke's Comments + Replies

How much chess engine progress is about adapting to bigger computers?

i) To pick a reference year, it seems reasonable to take the mid/late 1990s:
- Almost all chess engines before ~1996 lacked (or had serious inefficiencies) using multi-cores (very lengthy discussion here).
- Chess protocols became available, so that the engine and the GUI separated. That makes it straightforward to automate games for benchmarking.
- Modern engines should work on machines of that age, considering RAM constraints.
- The most famous human-computer games took place in 1997: Kasparov-Deep Blue. That's almost a quarter of a century ago (nice round n... (read more)

3Paul Christiano1y
I like using Fritz. It sounds like we are on basically the same page about what experiments would be interesting.
How much chess engine progress is about adapting to bigger computers?

Thank you for your interest: It's good to see people asking similar questions! Also thank-you for incentivizing research with rewards. Yes, I think closing the gaps will be straightforward. I still have the raw data, scripts, etc. to pick it up.

i) old engines on new hardware - can be done; needs definition of which engines/hardware

ii) raw data + reproduction - perhaps everything can be scripted and put on GitHub

iii) controls for memory + endgame tables - can be done, needs definition of requirements

iv) Perhaps the community can already agree on a set of experiments before they are performed, e.g. memory? I mean, I can look up "typical" values of past years, but I'm open for other values.

2Paul Christiano1y
i) I'm interested in any good+scalable old engine. I think it's reasonable to focus on something easy, the most important constraint is that it is really state of the art and scales up pretty gracefully. I'd prefer 2000 or earlier. ii) It would be great if where was at least a complete description (stuff like: these numbers were looked up from this source with links, the population was made of the following engines with implementations from this link, here's the big table of game results and the elo calculation, here was the code that was run to estimate nodes/sec). iii) For the "old" experiment I'd like to use memory from the reference machine from the old period. I'd prefer basically remove endgame tables and opening book. My ideal would be to pick a particular "old" year as the focus. Ideally that would be a year for which we (a) have an implementation of the engine, (b) have representative hardware from the period that we can use to compute nodes/sec for each of our engines. Then I'm interested in: * Compute nodes/sec for the old and new engine on both the old and new hardware. This gives us 4 numbers. * Evaluate elos both of those engines, running on both "old memory" and "new memory," as a function of nodes/turn. This gives us 4 graphs. (I assume that memory affects performance slightly independently of nodes/turn, at least for the new engine? If nodes/turn is the wrong measure, whatever other measure of computational cost makes sense, the important thing is that the cost is linear in the measurement.)
Measuring hardware overhang

Right. My experiment used 1 GB for Stockfish, which would also work on a 486 machine (although at the time, it was almost unheard of...)

Measuring hardware overhang

(a) The most recent data points are from CCRL. They use an i7-4770k and the listed tournament conditions. With this setup, SF11 has about 3500 ELOs. That's what I used as the baseline to calibrate my own machine (an i7-7700k).

(b) I used the SF8 default which is 1 GB.

(c) Yes. However, the hardware details (RAM, memory bandwidth) are not all that important. You can use these SF9 benchmarks on various CPUs. For example, the AMD Ryzen 1800 is listed with 304,510 MIPS and gets 14,377,000 nodes/sec on Stockfish (i.e., 19.9 nodes per MIPS). The oldest CPU in the ... (read more)