Artificial Intelligence is the study of creating intelligence in algorithms. AI Alignment is the task of ensuring [powerful] AI system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Common terms in this space are superintelligence, AI Alignment, AI Safety, Friendly AI, Transformative AI, human-level-intelligence, AI Governance, and Beneficial AI. This entry and the associated tag roughly encompass all of these topics: anything part of the broad cluster of understanding AI and its future impacts on our civilization deserves this tag.

AI Alignment

There are narrow conceptions of alignment, where you’re trying to get it to do something like cure Alzheimer’s disease without destroying the rest of the world. And there’s much more ambitious notions of alignment, where you’re trying to get it to do the right thing and achieve a happy intergalactic civilization.

But both the narrow and the ambitious alignment have in common that you’re trying to have the AI do that thing rather than making a lot of paperclips.

See also General Intelligence.

Basic Alignment Theory

Coherent Extrapolated Volition
Complexity of Value
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart's Law
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Newcomb's Problem
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Recursive Self-Improvement
Solomonoff Induction
Treacherous Turn
Utility Functions

Engineering Alignment

AI Boxing (Containment)
Conservatism (AI)
Debate (AI safety technique)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
Tool AI
Transparency / Interpretability
Value Learning



AI Governance
AI Risk
AI Services (CAIS)
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Transformative AI


AI Safety Camp
Centre for Human-Compatible AI
Future of Humanity Institute
Future of Life Institute
Machine Intelligence Research Institute



AI Capabilities
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Reinforcement Learning
Research Agendas 
Whole Brain Emulation