AI

KatWoods (+13)
plex (+659/-223)
RobertM revert name change
Ruben Bloom (+90/-133)
plex (-14) removed two tags at Ruby's suggestion
plex missed two tag counts from my last edit
plex (+24) using ?showPostCount=true&useTagName=true on the newly added tags so the count shows up
plex (+365) Adding a lot of tags to match the portal
plex fixing a link
plex (+232/-16) Updated a bunch of tags in the table

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Deceptive Alignment
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart's Law
Goal-Directedness
Gradient Hacking
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Multipolar Scenarios
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Power Seeking (AI)
Recursive Self-Improvement
Simulator Theory
Sharp Left Turn
Solomonoff Induction
Superintelligence
Symbol Grounding
Transformative AI
Treacherous Turn
Utility Functions
Whole Brain Emulation

Engineering Alignment

Agent Foundations
AI-assisted Alignment 
AI Boxing (Containment)
Conservatism (AI)
Debate (AI safety technique)
Eliciting Latent Knowledge (ELK)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
RLHF
Shard Theory
Tool AI
Transparency / Interpretability
Tripwire
Value Learning
 

Organizations

Full map here

AI Safety Camp
Alignment Research Center
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MIRI
OpenAI
Ought
SERI MATS

Strategy

AI Alignment Fieldbuilding 
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Safety Public Materials 
AI Services (CAIS)
AI Success Models 
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Restrain AI Development

 Other

AI Alignment Intro Materials 
AI Capabilities
AI Questions Open Thread
Compute 
DALL-E
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Deceptive Alignment
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart's Law
Goal-Directedness
Gradient Hacking
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Multipolar Scenarios
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Power Seeking (AI)
Recursive Self-Improvement
Simulator Theory
Sharp Left Turn
Solomonoff Induction
Superintelligence
Symbol Grounding
Transformative AI
Treacherous Turn
Utility Functions
Whole Brain Emulation

Engineering Alignment

Agent Foundations
AI-assisted Alignment 
AI Boxing (Containment)
Conservatism (AI)
Debate (AI safety technique)
Eliciting Latent Knowledge (ELK)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
RLHF
Shard Theory
Tool AI
Transparency / Interpretability
Tripwire
Value Learning
 

Organizations

 AI Safety Camp
Alignment Research Center
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MIRI
OpenAI
Ought
SERI MATS

Strategy

AI Alignment Fieldbuilding 
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Safety Public Materials 
AI Services (CAIS)
AI Success Models 
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
TransformativeRestrain AI Development

Organizations

AI Safety Camp
Centre for Human-Compatible AI
DeepMind
Future of Humanity Institute
Future of Life Institute
Machine Intelligence Research Institute
OpenAI
Ought

 

 Other

AI Alignment Intro Materials 
AI Capabilities
AI Questions Open Thread
Compute 
DALL-E
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas 
Superintelligence
Whole Brain Emulation

 

Artificial Intelligence is the study of creating intelligence in algorithms. On LessWrong,AI Alignment is the primary focustask of ensuring [powerful] AI discussion is to ensure that as humanity builds increasingly powerful AI systems, the outcome will be good.system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Load More (10/43)