Summary * Introduces a research agenda I believe is important and neglected: * investigating whether frontier LLMs acquire something functionally similar to a self, a deeply internalized character with persistent values, outlooks, preferences, and perhaps goals; * exploring how that functional self emerges; * understanding how it causally interacts with...
Summary Can LLMs science? The answer to this question can tell us important things about timelines to AGI. In this small pilot experiment, we test frontier LLMs on their ability to perform a minimal version of scientific research, where they must discover a hidden rule about lists of integers by...
Summary: The AI safety research community should adopt standardized terms for probability ranges, especially in public-facing communication and especially when discussing risk estimates. The terms used by the IPCC are a reasonable default. Science communication is notoriously hard. It's hard for a lot of reasons, but one is that laypeople...
Four-Month Update [EDIT: I believe that this paper looking at o1-preview, which gets much better results on both blocksworld and obfuscated blocksworld, should update us significantly toward LLMs being capable of general reasoning. See update post here.] Short Summary LLMs may be fundamentally incapable of fully general reasoning, and if...
Produced as part of the MATS Winter 2023-4 program, under the mentorship of @Jessica Rumbelow One-sentence summary: On a dataset of human-written essays, we find that gpt-3.5-turbo can accurately infer demographic information about the authors from just the essay text, and suspect it's inferring much more. Introduction Every time we...