Summary

  • Decreasing the cost of a factor of production creates an overhang if the elasticity of substitution between factors is less than 1
  • This corresponds to whether or not the elements of the production function are substitutes or complements - if you have a good enough algorithm does it not matter really how much compute and data you have, or does that algorithm get better performance the more compute and data you have? 
  • The concept of overhang makes sense if the inputs into the production function are complements - having more of one input makes the returns to increasing the other inputs higher - but not really if they’re substitutes 
  • Number of parameters and data usage are complements for large language models according to the Chincilla paper 
  • Things it would be useful to do: empirically estimate the elasticities of substitution between different factors of production for different types of AI systems e.g LLMs, deep RL on scientific tasks ect 

Production functions, complementarities and AI 

In economics, technologies are represented by production functions - functions which take inputs and map them to outputs. For instance, to make clothes you need might need cotton, sewing machines and workers. In machine learning, we actually have two outputs that we’re interested in. We’re interested firstly in whatever it is the model is trying to optimise for, like loss or reward, and secondly, we’re interested in the value from a model achieving a certain loss or reward. There are roughly four inputs into the machine learning production function: data, compute, algorithms, and engineering quality. The ML production function maps these four inputs to loss or reward which is then mapped to the thing we care about. In the case of large language models (LLMs), these are mapped to loss which is then mapped to how well the LLM does on the task you’re ultimately interested in it doing. 

 

One class of policy interventions for trying to make AI go well is to make it more likely that transformative AI (TAI) is developed primarily via advances in one of these inputs. For instance, it seems like there are lots of reasons to want TAI to be developed primarily via increases in compute rather than increases in algorithmic quality specifically. There are three primary reasons for this: firstly it seems to make it much more likely that TAI is developed by one of a small number of possible actors; it makes very fast takeoffs seem less likely; it seems more likely that we’ll have warned that TAI is about to be developed. 

 

If it is correct that some ways of developing TAI are robustly better than others then naturally one wants to find policy proposals that make it more likely that TAI is developed by pushing forward one of the factors of production. This depends fundamentally on whether the different factors of production are complements or substitutes.  
 

Factors of production are substitutes for one another when decreasing the amount of one input doesn’t change the productivity of the other input very much. For instance, if y = a + b where a and b are inputs then you fully substitute a for b with no change in the output of y. Beyond meat burgers and impossible burgers and substitute goods - if you replace one for the other in my burger I don’t value the burger any less.

 

Factors are complements when reducing the amount of one reduces the productivity of the other. The extreme version of this is when total output y = min{a,b}. Sewing machines and workers are roughly perfect complements for one another - if I have 5 workers and 5 sewing machines, getting 1 more sewing machine won’t let me make any more clothes. 
 

When we have perfect substitute production functions it’s very easy to encourage the use of one factor over another - make the factor you want people to use cheaper and more productive. This will mechanically mean that firms switch to using the relatively cheaper factor more. 
 

The story is much less clear when factors are complements. When factors are complements making one cheaper or more productive increases the amount which the other factor is used as well. This can be seen most clearly when you have perfect complements. We start with y = min{a,b}. If we make a twice as productive the production function now looks like y=min{a/2,b}, meaning you’ve just doubled the amount of b that the firm will use! You’re now creating an overhang in whichever factor you didn’t invest in despite making the factor you invested twice as productive on the margin. 
 

For instance, if you increased the productivity of compute by increasing the amount of data that LLMs have access to then assuming you have a perfect compliments production function this means that AI progress can only come through algorithmic progress.


 

What the Chincilla paper tells us 

In the Chincilla paper, loss as a function of a number of parameters and data for large language models  is given by 


L = A/N^a+B/D^b +E, where N is the number of parameters and D is data, with alpha slightly larger than beta. I started working out what the elasticity of substitution was exactly for this production function and then got bored and looked at the graph. The graph says that this is approximately a constant elasticity of substitution production function with a slight bias towards substituting in parameters for data. This production function is closer to a perfect complement production function than perfect substitute because it does asymptote in each of the arguments. 

New to LessWrong?

New Comment