AI ALIGNMENT FORUMTags
AF

Goal-Directedness

•

Applied to Measuring Coherence and Goal-Directedness in RL Policies by Dylan Xu 6d ago

•

Applied to Understanding mesa-optimization using toy models by tilmanr 21d ago

•

Applied to Measuring Coherence of Policies in Toy Environments by Dylan Xu 1mo ago

•

Applied to Refinement of Active Inference agency ontology by Roman Leventov 4mo ago

•

Applied to Quick thoughts on the implications of multi-agent views of mind on AI takeover by Kaj Sotala 4mo ago

•

Applied to Towards an Ethics Calculator for Use by an AGI by Sean Sweeney 5mo ago

•

Applied to “Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”) by RobertM 5mo ago

•

Applied to FAQ: What the heck is goal agnosticism? by RobertM 7mo ago

•

Applied to A thought experiment to help persuade skeptics that power-seeking AI is plausible by jacobcd52 7mo ago

•

Applied to Clarifying how misalignment can arise from scaling LLMs by Util 8mo ago

•

Applied to Think carefully before calling RL policies "agents" by Alex Turner 11mo ago

•

Applied to Creating a self-referential system prompt for GPT-4 by Ozyrus 1y ago

•

Applied to GPT-4 implicitly values identity preservation: a study of LMCA identity management by Ozyrus 1y ago

•

Applied to Investigating Emergent Goal-Like Behavior in Large Language Models using Experimental Economics by Steve Phelps 1y ago

•

Applied to Capabilities and alignment of LLM cognitive architectures by Seth Herd 1y ago