This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Goal-Directedness
•
Applied to
[Interim research report] Evaluating the Goal-Directedness of Language Models
by
Rauno Arike
2mo
ago
•
Applied to
A "Bitter Lesson" Approach to Aligning AGI and ASI
by
Roger Dearnaley
2mo
ago
•
Applied to
Emotional issues often have an immediate payoff
by
Chipmonk
3mo
ago
•
Applied to
Measuring Coherence and Goal-Directedness in RL Policies
by
Dylan Xu
5mo
ago
•
Applied to
Understanding mesa-optimization using toy models
by
tilmanr
5mo
ago
•
Applied to
Measuring Coherence of Policies in Toy Environments
by
Dylan Xu
6mo
ago
•
Applied to
Refinement of Active Inference agency ontology
by
Roman Leventov
9mo
ago
•
Applied to
Quick thoughts on the implications of multi-agent views of mind on AI takeover
by
Kaj Sotala
9mo
ago
•
Applied to
Towards an Ethics Calculator for Use by an AGI
by
Sean Sweeney
10mo
ago
•
Applied to
“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
by
RobertM
10mo
ago
•
Applied to
FAQ: What the heck is goal agnosticism?
by
RobertM
1y
ago
•
Applied to
A thought experiment to help persuade skeptics that power-seeking AI is plausible
by
jacob_drori
1y
ago
•
Applied to
Clarifying how misalignment can arise from scaling LLMs
by
Util
1y
ago
•
Applied to
Think carefully before calling RL policies "agents"
by
Alex Turner
1y
ago