At the latest EAG in London, I was challenged to explain what concept extrapolation would mean for GPT-3.

My first thought was the example from this post, where there were three clear patterns fighting each other for possible completions: the repetition pattern where she goes to work, the "she's dead, so she won't go to work" pattern, and the "it's the weekend, so she won't go to work" pattern.

That feels somewhat like possible "extrapolations" of the initial data. But the idea of concept extrapolation is that the algorithm is trying to cope with a shift in world-model, and extend its goal to that new situation.

What is the world-model of GPT-3? It consists of letters and words. What is its "goal"? To complete sentences in a coherent and humanlike way. So I tried the following expression, which would be close to its traditional world-model while expanding it a bit:

ehT niar ni niapS syats ylniam ni eht

What does this mean? Think of da Vinci. The correct completion is "nialp", the reverse of "plain".

I ran that through the GPT-3 playground (text-davinci-002, temperature 0.7, maximum length 256), and got:

ehT niar ni niapS syats ylniam ni eht teg dluoc I 'segaJ niar ni dna ro niar ni eht segauq ,ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ,ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni eht segauq ro niar ni eht dna ro niar ni e

I think we can safely say it broke GPT-3. The algorithm seems to have caught the fact that the words were spelt backwards, but has given up on any attempt to order them in a way that makes sense. It has failed to extend its objective to this new situation.

New Comment
3 comments, sorted by Click to highlight new comments since: Today at 7:13 AM

You would behave the exact same way as GPT-3, were you to be put in this same challenging situation. In fact I think you'd do worse; GPT-3 managed to get quite a few words actually reversed whereas I expect you'd just output gibberish. (Remember, you only have about 1 second to think before outputting each token. You have to just read the text and immediately start typing.)

The aim of this post is not to catch out GPT-3; it's to see what concept extrapolation could look like for a language model.

OK, cool. I think I was confused.