OpenAI had generated poems in the New Yorker, which suggests they might have had some internal project related to poetry.With GPT3.5, I think there's also "mode collapse" for style in writing prose (e.g. plays or stories). Claude does not have this mode collapse in poetry or prose. (It maybe has a much more subtle version of it). This suggests to me it'd be relatively easy to fix ChatGPT's issues (as Gwern suggests). Does anyone know how much poetry and literary prose is in the pre-training sets aside from stuff in Common Crawl?
(I haven't yet read the paper carefully). The main question of interest is: "How well can transformer do RL in-context after being trained to do so?" This paper only considers quite narrow and limited tasks but future work will extend this and iterate on various parts of the setup. How do these results update your belief on the main question of interest? It's possible the result can be explained away (as you suggest) but also that there is some algorithm distillation going on.
This is very valuable. I suggest putting this content on Arxiv (even it's less formal that the typical paper).
This is a distribution of math problems GPT-3 wasn't finetuned on. Yet it's able to few-shot generalize and perform well. This is an amazing level of robustness relative to 2018 deep learning systems. I don't see why scaling and access to external tools (e.g. to perform long calculations) wouldn't produce the kind of robustness you have in mind.
I'm somewhat skeptical that models will actually be able to robustly learn these kinds of abstractions with a reasonable amount of scaling
GPT-3 (without external calculators) can do very well on math word problems (https://arxiv.org/abs/2206.02336) that combine basic facts about the world with abstract math reasoning. Why think that the kind of causal reasoning humans do is out of reach of scaling (especially if you allow external calculators)? It doesn't seem different in kind from these math word problems.
when can/do foundation models internalize explicitly stated knowledge
Some human causal reasoning is explicit. Humans can't do complex and exact calculations using System 1 intuition, and neither can we do causal reasoning of any sophistication using System 1. The prior over causal relations (e.g. that without looking at any data 'smoking causes cancer' is way more likely than the reverse) is more about general world-model building, and maybe there's more uncertainty about how well scaling learns that.
In the pre-training set, there are lots of places where humans talk about causality (both informally and more formally in myriad academic papers). So a model would ultimately need to learn abstract stuff about causality (e.g. correlation is not causation, arrow of time, causes are local, etc) and concrete causal facts (the moon causes tides, tiny organisms cause mold, etc). Given this knowledge, it's plausible a model M could make reasonable guesses for questions like, "What happens when a model with [properties of model M] starts interacting with the world?" These guesses would be improved by finetuning by RL on actual interaction between M and the world.
(It seems that most of what my ability to make OOD predictions or causal inferences is based on passive/offline learning. I know science from books/papers and not from running my own rigorous control experiments or RCTs.)
Cool post! Did you try seeing whether GPT-3 can regenerate parts of the Iris dataset (or any other datasets that may appear in its training data)? I'd also be interested to see finetuning results, results for the latest InstructGPT, and to see analysis of the GPT-3 Embeddings for integers and floats.
This is a fantastic resource and seems like a great project for a research assistant. As with Rohin Shah's alignment newsletter, I'm excited to see this project continue and (potentially) expand.
I agree with most of this -- and my original comment should have been clearer. I'm wondering if the past five years of direct observations leads you to update the geography-based prior (which has been included in your alignment review for since 2018). How much do you expect the quality of alignment work to differ from a new organization based in the Bay vs somewhere else? (No need to answer: I realize this is probably a small consideration and I don't want to start an unproductive thread on this topic).
Evans et al.'s Truthful AI: Developing and governing AI that does not lie is a detailed and length piece discussing a lot of issues around truthfulness for AI agents. This includes conceptual, practical and governance issues, especially with regard conversation bots. They argue for truthfulness (or at least, non-negligently-false)
The link should include "that does not lie". length --> lenghty
Lin et al.'s TruthfulQA: Measuring How Models Mimic Human Falsehoods provides a series of test questions to study how 'honest' various text models are. Of course, these models are trying to copy human responses, not be honest, so because many of the questions allude to common misconceptions, the more advanced models 'lie' more often. Interestingly they also used GPT-3 to evaluate the truth of these answers. See also the discussion here. Researchers from OpenPhil were also named authors on the paper. #Other
"OpenPhil" --> OpenAIAs a minor clarification, all the results in the paper are based on human evaluation of truth. But we show that GPT-3 can be used as a fairly reliably substitute for human evaluation under certain conditions.