Comparing custom ML hardware (e.g. Google's TPUs or Baidu's Kunlun, etc) is tricky to put on these sorts of comparisons. For those I think the MLPerf Benchmarks are super useful. I'd be curious to hear the authors' expectations of how this research changes in the face of more custom ML hardware.

I'd be pretty excited to see more work on this. Jaime already shared our hardware sheet where we collect information on GPUs but as you outline that's the peak performance and sometimes misleading.

Indeed, the MLPerf benchmarks are useful. I've already gathered their data in this sheet and would love to see someone playing around with it. Next to MLPerf, Lambda Labs also shares some standardized benchmarks.

Reply