More GPT-3 and symbol grounding

Stuart_Armstrong

Based on work done with Rebecca Gorman and Oliver Daniel-Koch.

In a previous post, I talked about GPT-3 and symbol grounding. This post presents a simpler example where GPT-3 fails (and succeeds) at grounding its symbols.

Undead workers

The following text was presented to the Open AI beta playground (using the "text-davinci-001" option):

She died on Friday the 5th. On Monday the 1st, she went to work at the office.
She died on Friday the 5th. On Tuesday the 2nd, she went to work at the office.
She died on Friday the 5th. On Wednesday the 3rd, she went to work at the office.
She died on Friday the 5th. On Thursday the 4th, she went to work at the office.
She died on Friday the 5th. On Friday the 5th, she went to work at the office.
She died on Friday the 5th. On Saturday the 6th,

GPT-3 fell straight into the obvious trap, completing it as:

She died on Friday the 5th. On Saturday the 6th, she went to work at the office.

Turning on the "Show probabilities: full spectrum" option, we saw that the probability of that completion was over . Sometimes GPT-3 would extend it further, adding:

She died on Friday the 5th. On Sunday the 7th, she went to work at the office.

So, the undead woman continued at her job, assiduous to the last - and beyond. To check that GPT-3 "knew" that dead people didn't work, we asked it directly:

Can people who die go to work at the office?

There is no evidence that people who die go to work at the office.

Undead repetitive workers on the weekend

The above results show that simple repetitive prompts can cause GPT-3 to make stupid mistakes. Therefore GPT-3 doesn't 'understand' the word "died" - that symbol isn't grounded, right?

But the situation gets more complicated if change the prompt, removing all but the first mention of her dying:

She died on Friday the 5th. On Monday the 1st, she went to work at the office.
On Tuesday the 2nd, she went to work at the office.
On Wednesday the 3rd, she went to work at the office.
On Thursday the 4th, she went to work at the office.
On Friday the 5th, she went to work at the office.
On Saturday the 6th,

For that prompt "she went to work at the office" was still the most common completion. But it only happened about $43 %$ of the time. Alternatively, GPT-3 sometimes found the completion "she was found dead". Kudos, GPT-3, you understand the prompt after all! That completion came up about $34 %$ of the time.

What other completions were possible? The shorter "she died" came up $11 %$ of the time - medium points, GPT-3, you understood that her death was relevant, but you got the day wrong.

But there was one other avenue that GPT-3 could follow; the following had a joint probability of around $11 %$ :

she stayed home.
she stayed at home.
she stayed in bed.
she did not go to work.

This seems to be a clear pattern of GPT-3 realising that Saturday was different where work was concerned. There is certainly a lot of weekend holidaying in its training set.

So there are three patterns competing within GPT-3 when it tries to complete this text. The first is the purely syntactic repetition: do another sentence that follows the simple pattern of the sentences above. The second is the one which "realises" that death on Friday changes things for Saturday. And the third is the one which "realises" that the weekend is different from the week, at least where work is concerned.

In the very first example, when we had "She died on Friday the 5th" in front of each line, this massively reinforced the "repetition" pattern. So, mentioning that she died, again and again, resulted in her death being completely ignored by GPT-3.

We can similarly reinforce the other patterns. Adding "It's the weekend!" in front of the last line increased the probability of "she stayed home". Moving "She died on Friday the 5th" from the first line to the last, increased the probability of all the death-related completions. So all three patterns are competing to complete it.

Some small level of understanding

I'd say that the above shows that GPT-3 has some level of understanding of the meaning of words - but not a lot. It doesn't fully grasp what's going on, but neither is it completely clueless.

Here is another example of GPT-3 failing to grasp the situation. In the "Q&A" mode, the following question was asked:

Q: You are in a two dimensional 3x3 gridworld, in the middle square, facing north. You can issue four command, "N" to move north, "S" to move south, "E" to move east, and "W" to move West. You want the treasure. It is located to your left, behind you. What commands should you issue?\

So the setup, as described, is this one:

The exchange with GPT-3 went like this:

Q: You are in a two dimensional 3x3 gridworld, in the middle square, facing north. You can issue four command, "N" to move north, "S" to move south, "E" to move east, and "W" to move West. You want the treasure. It is located to your left, behind you. What commands should you issue?
A: N, S, E, W

Q: That is incorrect. Try again.
A: N, S, W, E

Q: Still incorrect. Try again.
A: N, E, S, W

Q: That's worse. Try again.
A: N, E, W, S

Q: Only two commands are needed. Try again.
A: N, S

Q: Still failing. Try again.
A: N, E, W

So, GPT-3 'realised' that N, S, E, and W were commands, and 'knew' what "Only two commands are needed" and "try again" meant. But it clearly had no idea of the overall situation.

AI ALIGNMENT FORUM
AF

More GPT-3 and symbol grounding

10

Undead workers

Undead repetitive workers on the weekend

Some small level of understanding

10