AI ALIGNMENT FORUMTags
AF

Goal-Directedness

•
Applied to “Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”) by RobertM 19h ago
•
Applied to FAQ: What the heck is goal agnosticism? by RobertM 2mo ago
•
Applied to A thought experiment to help persuade skeptics that power-seeking AI is plausible by jacobcd52 2mo ago
•
Applied to Clarifying how misalignment can arise from scaling LLMs by Util 3mo ago
•
Applied to Think carefully before calling RL policies "agents" by Alex Turner 6mo ago
•
Applied to Creating a self-referential system prompt for GPT-4 by Ozyrus 6mo ago
•
Applied to GPT-4 implicitly values identity preservation: a study of LMCA identity management by Ozyrus 6mo ago
•
Applied to Investigating Emergent Goal-Like Behavior in Large Language Models using Experimental Economics by Steve Phelps 7mo ago
•
Applied to Capabilities and alignment of LLM cognitive architectures by Seth Herd 7mo ago
•
Applied to Agentized LLMs will change the alignment landscape by Seth Herd 8mo ago
•
Applied to Imagine a world where Microsoft employees used Bing by Christopher King 8mo ago
•
Applied to GPT-4 busted? Clear self-interest when summarizing articles about itself vs when article talks about Claude, LLaMA, or DALL·E 2 by Christopher King 8mo ago
•
Applied to 100 Dinners And A Workshop: Information Preservation And Goals by Stephen Fowler 8mo ago
•
Applied to Does GPT-4 exhibit agency when summarizing articles? by Christopher King 8mo ago