This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Goal-Directedness
•
Applied to
Measuring Coherence and Goal-Directedness in RL Policies
by
Dylan Xu
6d
ago
•
Applied to
Understanding mesa-optimization using toy models
by
tilmanr
21d
ago
•
Applied to
Measuring Coherence of Policies in Toy Environments
by
Dylan Xu
1mo
ago
•
Applied to
Refinement of Active Inference agency ontology
by
Roman Leventov
4mo
ago
•
Applied to
Quick thoughts on the implications of multi-agent views of mind on AI takeover
by
Kaj Sotala
4mo
ago
•
Applied to
Towards an Ethics Calculator for Use by an AGI
by
Sean Sweeney
5mo
ago
•
Applied to
“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
by
RobertM
5mo
ago
•
Applied to
FAQ: What the heck is goal agnosticism?
by
RobertM
7mo
ago
•
Applied to
A thought experiment to help persuade skeptics that power-seeking AI is plausible
by
jacobcd52
7mo
ago
•
Applied to
Clarifying how misalignment can arise from scaling LLMs
by
Util
8mo
ago
•
Applied to
Think carefully before calling RL policies "agents"
by
Alex Turner
11mo
ago
•
Applied to
Creating a self-referential system prompt for GPT-4
by
Ozyrus
1y
ago
•
Applied to
GPT-4 implicitly values identity preservation: a study of LMCA identity management
by
Ozyrus
1y
ago
•
Applied to
Investigating Emergent Goal-Like Behavior in Large Language Models using Experimental Economics
by
Steve Phelps
1y
ago
•
Applied to
Capabilities and alignment of LLM cognitive architectures
by
Seth Herd
1y
ago