This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Goal-Directedness
•
Applied to
“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
by
RobertM
19h
ago
•
Applied to
FAQ: What the heck is goal agnosticism?
by
RobertM
2mo
ago
•
Applied to
A thought experiment to help persuade skeptics that power-seeking AI is plausible
by
jacobcd52
2mo
ago
•
Applied to
Clarifying how misalignment can arise from scaling LLMs
by
Util
3mo
ago
•
Applied to
Think carefully before calling RL policies "agents"
by
Alex Turner
6mo
ago
•
Applied to
Creating a self-referential system prompt for GPT-4
by
Ozyrus
6mo
ago
•
Applied to
GPT-4 implicitly values identity preservation: a study of LMCA identity management
by
Ozyrus
6mo
ago
•
Applied to
Investigating Emergent Goal-Like Behavior in Large Language Models using Experimental Economics
by
Steve Phelps
7mo
ago
•
Applied to
Capabilities and alignment of LLM cognitive architectures
by
Seth Herd
7mo
ago
•
Applied to
Agentized LLMs will change the alignment landscape
by
Seth Herd
8mo
ago
•
Applied to
Imagine a world where Microsoft employees used Bing
by
Christopher King
8mo
ago
•
Applied to
GPT-4 busted? Clear self-interest when summarizing articles about itself vs when article talks about Claude, LLaMA, or DALL·E 2
by
Christopher King
8mo
ago
•
Applied to
100 Dinners And A Workshop: Information Preservation And Goals
by
Stephen Fowler
8mo
ago
•
Applied to
Does GPT-4 exhibit agency when summarizing articles?
by
Christopher King
8mo
ago