eggsyntax — AI Alignment Forum

Personality Self-Replicators

One-sentence summary I describe the risk of personality self-replicators, the threat of OpenClaw-like agents spreading in hard-to-control ways. Summary LLM agents like OpenClaw are defined by a small set of text files and are run by an open source framework which leverages LLMs for cognition. It is quite difficult for...

Mar 5173

On the functional self of LLMs

Summary * Introduces a research agenda I believe is important and neglected: * investigating whether frontier LLMs acquire something functionally similar to a self, a deeply internalized character with persistent values, outlooks, preferences, and perhaps goals; * exploring how that functional self emerges; * understanding how it causally interacts with...

Jul 7, 2025124

Numberwang: LLMs Doing Autonomous Research, and a Call for Input

Summary Can LLMs science? The answer to this question can tell us important things about timelines to AGI. In this small pilot experiment, we test frontier LLMs on their ability to perform a minimal version of scientific research, where they must discover a hidden rule about lists of integers by...

Jan 16, 202573

AIS terminology proposal: standardize terms for probability ranges

Summary: The AI safety research community should adopt standardized terms for probability ranges, especially in public-facing communication and especially when discussing risk estimates. The terms used by the IPCC are a reasonable default. Science communication is notoriously hard. It's hard for a lot of reasons, but one is that laypeople...

Aug 30, 202430

LLM Generality is a Timeline Crux

Four-Month Update [EDIT: I believe that this paper looking at o1-preview, which gets much better results on both blocksworld and obfuscated blocksworld, should update us significantly toward LLMs being capable of general reasoning. See update post here.] Short Summary LLMs may be fundamentally incapable of fully general reasoning, and if...

Jun 24, 2024219

Language Models Model Us

Produced as part of the MATS Winter 2023-4 program, under the mentorship of @Jessica Rumbelow One-sentence summary: On a dataset of human-written essays, we find that gpt-3.5-turbo can accurately infer demographic information about the authors from just the essay text, and suspect it's inferring much more. Introduction Every time we...

May 17, 2024159

Egg Syntax

Egg Syntax

Egg Syntax

LLM Generality is a Timeline Crux

Personality Self-Replicators

Language Models Model Us

On the functional self of LLMs

Egg Syntax

LLM Generality is a Timeline Crux

Personality Self-Replicators

Language Models Model Us

On the functional self of LLMs

Personality Self-Replicators

On the functional self of LLMs

Numberwang: LLMs Doing Autonomous Research, and a Call for Input

AIS terminology proposal: standardize terms for probability ranges

LLM Generality is a Timeline Crux

Language Models Model Us