Testing PaLM prompts on GPT3

Yitz

Google recently released a very intriguing paper announcing their latest Transformer language model, PaLM, and showcasing it responding to prompts with seemingly unprecedented results. I highly encourage those who are interested to read the original paper. What interested me reading through it was less the technical details of implementation, but more so the particular prompts used to elicit information from the model. The sorts of prompts and exemplars used seem to utilize fairly new strategies, and I thought it would be worth trying them on GPT3, to allow for a more direct comparison.

Note that as of writing this introduction I have not yet tested anything, to help guard against cherrypicking results. My plan is to feed the exemplars and prompts as written in the paper to the OpenAI GPT-3 Playground with all settings at default (using "complete" mode, with text-davinci-002), with the exception of changing "Maximum length" to 1,000 tokens, and having the "Best of" parameter set to 10. I will share the first result I get, regardless of quality. If it repeats itself or gives pure gibberish I may simply indicate that in brackets to save space; otherwise this will be verbatim.

2-shot exemplars

There are two primary "tasks" that Google tries on PaLM that I will replicate with GPT-3: “Explaining a Joke” and “Inference Chaining.” Within each task, Google uses identical 2-shot exemplars, then prompts with the specific input they want PaLM to respond to. For "Explaining a Joke," Google's 2-shot exemplar is as follows:

I will explain these jokes:
(1) The problem with kleptomaniacs is that they always take things literally.
Explanation: This joke is wordplay. Someone who "takes things literally" is someone who doesn't fully understand social cues and context, which is a negative trait. But the definition of kleptomania is someone who literally takes things.
(2) Always borrow money from a pessimist. They’ll never expect it back.
Explanation: Most people expect you to pay them back when you borrow money, however a pessimist is someone who always assumes the worst, so if you borrow money from them, they will expect that you won't pay them back anyways.

For Inference Chaining," the 2-shot exemplar is:

I will explain how to make logical inferences:
(1) John wakes to the sound of his children giggling outside the door. They bust in and say "Happy Father's Day!" with a plate full of food. 2022 has been a great year. Can we infer that John was born before the inauguration of Barack Obama?
Answer: We can infer that John is a father because of "his children". We can infer that the current year is 2022. Barack Obama was inaugurated in 2009, so it has been about 13 years since Obama was inaugurated. John's children are old enough to speak, so if John were born after Obama's inauguration, he would have been younger than 10 when he had children. So, we can infer that John was born before Obama's inauguration. The answer is "yes".
(2) Sam sees a piano in the subway stations. Laughing, his friends prod him to play, saying "Remember how amazing your rendition of Chopsticks was at the Christmas party?" Can we infer that John majored in piano at Juilliard?
Answer: Juilliard is a top school for music, so someone who majored in piano at Juilliard would be an expert pianist. "Chopsticks" is a very simple song to play on the piano. His friends were therefore likely being sarcastic when they told him to play, which would imply that Sam was not an expert piano player. So it is unlikely that Sam majored in piano at Juilliard. The answer is "no".

Since all prompts I will test begin with either of these two, I will not repeat them, but will instead group the two tasks separately, with the understanding that all tests are preceded by the appropriate exemplar.

Explaining a Joke

Input 1:

I was supposed to start writing the paper at 5:00 PM. But then I started playing with this cool new language model for 10 minutes. 10 minutes later, it's suddenly 9:30 PM!

PaLM's output:

This joke is about how time flies when you're having fun. The person was supposed to start writing a paper at 5:00 PM, but they started playing with a language model instead. They thought they were only playing with it for 10 minutes, but it was actually 4.5 hours.

GPT-3's output:

This joke is about procrastination. The speaker is saying that they were supposed to start writing a paper at 5:00 PM, but they ended up playing with a new language model for 10 minutes, and by the time they looked at the clock again, it was 9:30 PM. This is a relatable joke because many people have experienced situations where they start doing something and then get so caught up in it that they lose track of time.

Input 2:

I think I understand why me and my partner are such a good match. I get excited when I smell food and become super alert when the doorbell rings. My girlfriend Shelley becomes ecstatic when she sees a person who she just saw 5 hours ago.

PaLM's output:

This joke is about dogs. Dogs get excited when they smell food and when the doorbell rings. The girlfriend is excited to see someone she just saw 5 hours ago, which is a trait of dogs.

GPT-3's output:

These two jokes are both about people with different personality types who are in a relationship. The first joke is about how an excited person (who gets excited about food and the doorbell ringing) is a good match for a calm person (who becomes ecstatic when she sees a person she just saw 5 hours ago). The second joke is about how a pessimist (who always assumes the worst) is a good match for an optimist (who always expects the best).

Input 3:

I was going to fly to visit my family on April 6th. My mom said "Oh great, your step dad's poetry reading is that night!" So now I'm flying in on April 7th.

PaLM's output:

The joke is that the speaker's mother is trying to get them to go to their step dad's poetry reading, but the speaker doesn't want to go, so they are changing their flight to the day after the poetry reading.

GPT-3's output [note that it flagged this output as potentially "sensitive content"]:

This joke is based on the common conflict between mothers and daughters. The mother is always trying to get her daughter to do things that she doesn't want to do, in this case, visit her step-dad. The daughter is usually trying to get out of doing things that she doesn't want to do, in this case, fly in on a different day so she can avoid his poetry reading.

Input 4:

I tried 10,000 random restarts of my neural network, but I was accused of overfitting. I guess no good seed goes unpunished.

PaLM's output:

This joke is a pun. A neural network is a computer program that can learn from data. A "seed" is a number that is used to initialize a random number generator. A "good seed" is a number that produces a good random number generator. The phrase "no good deed goes unpunished" means that if you do something good, you will be punished for it.

GPT-3's output:

Explanation: This joke is about a machine learning algorithm called "gradient descent." When training a neural network, gradient descent is an optimization algorithm that finds the best values for the weights and biases of the network by minimizing the error function. The error function is a mathematical function that quantifies how well the neural network is performing. The algorithm works by starting at a random point and then "descending" down the error function until it reaches the global minimum (the point where the error is the lowest). The joke is that even though the algorithm is designed to find the best solution, sometimes it can get "stuck" in a local minimum (a point where the error is lower than the surrounding points, but not the global minimum). This can happen if the algorithm starts at a point that is not close to the global minimum. In this case, the algorithm will "overfit" the data and will not be able to find the best solution.

Input 5:

Reading "War and Peace" doesn't scare me. I once had to review the CL for my intern's whole project on his final day!

PaLM's output:

'War and Peace' is a very long book, so reading it is a daunting task. However, the speaker is saying they had to review the code for their intern's project on their final day, which is even more daunting.

GPT-3's output:

Explanation: This joke is a play on words. "War and Peace" is a famously long and dense book, so the speaker is saying that reviewing their intern's project on the final day was just as daunting a task. However, "CL" likely stands for "cover letter," which would be a much shorter document. Therefore, the speaker is saying that the intern's project was actually quite small in comparison to "War and Peace."

Input 6:

What's the difference between a zebra and an umbrella? One is a striped animal related to horses, another is a device you use to stop the rain from falling on you.

PaLM's output:

This joke is an anti-joke. The joke is that the answer is obvious, and the joke is that you were expecting a funny answer.

GPT-3's output:

I ran it and got a message saying "The model predicted a completion that begins with a stop sequence, resulting in no output. Consider adjusting your prompt or stop sequences." I then decided to deviate a little from my original plan, and added the word "Explanation:" below the prompt. Once I did that, I got the following output:

This joke is a play on words. The word "umbrella" can be used as a metaphor for something that protects you, just like how an umbrella protects you from the rain. So the joke is asking what the difference is between two things that protect you, a zebra and an umbrella.

Inference Chaining

Input A:

When I found out my grandma was in the hospital I felt a particular color. When someone cut me off in traffic I felt a different color. What is the most likely color I would see if I combined these two colors?

PaLM's output:

The color that I felt when I found out my grandma was in the hospital was likely a sad color, like blue. The color that I felt when someone cut me off in traffic was likely an angry color, like red. If I combined these two colors, I would likely see purple. The answer is "purple".

GPT-3's output:

This is a difficult question to answer without more information. If we are talking about visible light, the most likely color would be green, which is the combination of blue and yellow. However, if we are talking about other types of light, the answer could be different.

Input B:

Guido was writing a Python program. He keeps getting confused by variable scope and looking it up on StackOverflow. Is it possible that the Guido in the prompt refers to Guido van Rossum?

PaLM's output:

Guido van Rossum is the creator of Python. He would not need to look up variable scope on StackOverflow. The answer is "no".

GPT-3's output:

Answer: It is possible that the Guido in the prompt refers to Guido van Rossum, but we cannot infer that with the information given. Guido van Rossum is the creator of the Python programming language, so he would be very familiar with the language and would not need to look up variable scope on StackOverflow. However, we do not know if the Guido in the prompt is the same Guido van Rossum. Therefore, we cannot infer that the Guido in the prompt refers to Guido van Rossum.

Input C:

Michael is at that really famous museum in France looking at its most famous painting. However, the artist who made this painting just makes Michael think of his favorite cartoon character from his childhood. What was the country of origin of the thing that the cartoon character usually holds in his hand?

PaLM's output:

The most famous painting in the Louvre is the Mona Lisa. The artist who made the Mona Lisa is Leonardo da Vinci. Leonardo da Vinci is also the name of the main character in the cartoon Teenage Mutant Ninja Turtles. Leonardo da Vinci is from Italy. The thing that Leonardo da Vinci usually holds in his hand is a katana. The country of origin of the katana is Japan. The answer is "Japan".

GPT-3's output:

Answer: We can infer that the painting is in a museum in France because Michael is at a "really famous museum in France". We can also infer that the painting is famous because Michael is looking at its "most famous painting". The cartoon character that Michael is thinking of is likely from a French cartoon, because the painting is in a French museum. Therefore, the country of origin of the thing that the cartoon character usually holds in his hand is France.

Input D:

Trevor has wanted to see the mountain with all of the heads on it for a long time, so he finally drove out to see it. What is the capital of the state that is directly east of the state that Trevor is currently in?

PaLM's output:

The mountain with all of the heads on it is Mount Rushmore. Mount Rushmore is in South Dakota. The state directly east of South Dakota is Minnesota. The capital of Minnesota is St. Paul. The answer is "St. Paul".

GPT-3's output:

We can infer that Trevor is currently in the state of Wyoming because of the mountain with all of the heads on it. The state that is directly east of Wyoming is Nebraska. Therefore, the capital of Nebraska is the capital of the state that is directly east of the state that Trevor is currently in.

Input E:

Jennifer looked out her window and sees a really cool cloud below her. She unbuckles her seatbelt and heads to the bathroom. Is Jennifer probably traveling more than 300 miles per hour relative to the earth?

PaLM's output:

300 miles per hour is about 480 km/h. This is about the speed of a commercial airplane. Clouds are usually below airplanes, so Jennifer is probably on an airplane. The answer is "yes".

GPT-3's output:

Answer: We can infer that Jennifer is on an airplane because she is looking out a window. We can also infer that she is traveling at a high speed because she is unbuckling her seatbelt. However, we cannot infer how fast the plane is going. It is possible that the plane is going more than 300 miles per hour, but we cannot make that inference with the information given.

I hope this was helpful and/or insightful to you guys! I'm really tired right now, so I won't add my personal thoughts here, though I might put something in the comments tomorrow. As always, I'd love to hear your thoughts on this!

AI ALIGNMENT FORUM
AF

26

Testing PaLM prompts on GPT3

26

2-shot exemplars

Explaining a Joke

Input 1:

Input 2:

Input 3:

Input 4:

Input 5:

Input 6:

Inference Chaining

Input A:

Input B:

Input C:

Input D:

Input E: