AI ALIGNMENT FORUM
AF

Wikitags

Chain-of-Thought Alignment

Edited by Roger Dearnaley, Roger Dearnaley, et al. last updated 1st Dec 2023

"Chain-of-thought" autonomous agentic wrappers such as AutoGPT around an LLM such as GPT-4, and similar Language Model Cognitive Architectures (LMCAs) (other commonly used terms are Language Model Autonomous Agents (LMAAs), or Scaffolded LLMs), are a recent candidate approach to building an AGI.

They create, edit, and maintain a natural language context by recursively feeding parts of this into the LLM along with suitable prompts for activities like subtask planning, self-criticism, and memory summarization, generating a textual stream-of-consciousness, memories etc. They thus combine LLM neural nets with natural language symbolic thinking more along the lines of GOFAI.

Recent open-source examples are quite simple and not particularly capable, but it seems rather plausible that they could progress rapidly. They could make interpretability much easier than pure neural net systems, since their 'chain-of-though'/'stream of consciousness' and 'memories' would be written in human natural language, so interpretable and editable by a monitoring human or LLM-based monitoring system (modulo concerns about opaque natural language or detecting possible hidden steganographic side-channels concealed in apparently-innocent natural language). This topic discusses the alignment problem for systems combining such agentic wrappers with LLMs, if they are in fact capable of approaching or reaching AGI.

See Also

  • LLM Powered Autonomous Agents (Lilian Weng, 2023)
Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged Chain-of-Thought Alignment
26Capabilities and alignment of LLM cognitive architectures
Seth Herd
2y
0
109the case for CoT unfaithfulness is overstated
nostalgebraist
10mo
1
55Externalized reasoning oversight: a research direction for language model alignment
tamera
3y
11
15Language Agents Reduce the Risk of Existential Catastrophe
Cameron Domenico Kirk-Giannini, Simon Goldstein
2y
1
10Scaffolded LLMs: Less Obvious Concerns
Stephen Fowler
2y
0
39We should start looking for scheming "in the wild"
Marius Hobbhahn
4mo
2
36Unfaithful Reasoning Can Fool Chain-of-Thought Monitoring
Benjamin Arnav, Pablo Bernabeu Perez, Tim Kostolansky, Hannes Whittingham, Nathan Helm-Burger, Mary Phuong
1mo
1
34Steganography in Chain of Thought Reasoning
A Ray
3y
9
20Internal independent review for language model agent alignment
Seth Herd
2y
0
235 ways to improve CoT faithfulness
Caleb Biddulph
9mo
24
15We have promising alignment plans with low taxes
Seth Herd
2y
0
22Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin
2y
0
10[ASoT] Simulators show us behavioural properties by default
Arun Jose
3y
1
8Language Models are a Potentially Safe Path to Human-Level AGI
Nadav Brandes
2y
1
8Sleep peacefully: no hidden reasoning detected in LLMs. Well, at least in small ones.
Ilia Shirokov, Ilya Nachevsky
3mo
0
Load More (15/34)
Add Posts