Train a large language model unsupervised in standard way
Repeat:
1. Take some supervised learning problem, throw away the supervised outputs, and repeat:
  1. Pick a supervised input and use the language model to sample N different outputs
  2. Select a "consensus" output among the N sampled outputs
  3. Store this "consesus output" as the new "target output" for the selected input
2. Fine-tune the language model in supervised way using the input/output pairs generated above

It's quite reasonable for them to call this "self-improvement" but it's only vaguely related to the kind of self-improvement that was much discussed in 2010s-era AI safety circles. That kind of self-improvement was about an AI that can improve its own architecture. The kind of self-improvement discussed in this paper is not about improving its own architecture.

Still, its bizarre that the thing being done in this paper works at all. It's as if the mere act of narrowing the distribution of outputs for a new supervised learning task is innately homing in on the truth... whatever that means. Strange.

[-]Evan R. Murphy3y*20

The paper describes a method for self-improvement in LLMs. But does it work for recursive self-improvement? I haven't found any mention of recursion or multiple iterations in the paper.

The most relevant section seems to be 5.2 PUSHING THE LIMIT OF SELF-IMPROVEMENTS. Here the authors talk about their attempts to have the model use self-generated questions and self-generated few-shot Chain-of-Thought prompting. They did measure self-improvement when using self-generated questions, but the self-improvement wasn't as great as when they used training-set questions. But who cares if it's lesser self-improvement if it's an unsupervised process and you can just iterate on it to get more and more self-improvement?

I'm assuming their numbers show the self-improvement they measured for one iteration. Or is it actually the maximum self-improvement possible for any number of iterations?

[-]Evan R. Murphy3y10

This paper is now on arXiv (in addition to OpenReview) and published non-anonymously there by Jiaxin Huang et al. from University of Illinois Urbana-Champaign and Google.

[-]Evan R. Murphy3y*11

It seems like this could benefit the smaller labs working on LLMs and toward AGI.

Chinchilla basically made it seem like only the big-data companies would have the means to produce competitive models going forward. But if generative models can produce their own data for reliable self-improvement, that shows a way forward for companies like Anthropic who don't have massive private data sources to train on (e.g. data from YouTube or Facebook Messenger).

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

12

Paper: Large Language Models Can Self-improve [Linkpost]

12