AI ALIGNMENT FORUM
AF

AI

		KatWoods	v2.28.0Apr 5th 2023	(+13)
		plex	v2.27.0Dec 25th 2022	(+659/-223)
		RobertM	v2.26.0Oct 17th 2022	revert name change
		Ruben Bloom	v2.25.0Oct 17th 2022	(+90/-133)
		plex	v2.24.0Oct 4th 2021	(-14) removed two tags at Ruby's suggestion
		plex	v2.23.0Oct 4th 2021	missed two tag counts from my last edit
		plex	v2.22.0Oct 4th 2021	(+24) using ?showPostCount=true&useTagName=true on the newly added tags so the count shows up
		plex	v2.21.0Oct 2nd 2021	(+365) Adding a lot of tags to match the portal
		plex	v2.20.0Mar 11th 2021	fixing a link
		plex	v2.19.0Mar 11th 2021	(+232/-16) Updated a bunch of tags in the table

Load More (10/43)

KatWoods v2.28.0Apr 5th 2023 (+13) LW0

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Deceptive Alignment
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart's Law
Goal-Directedness
Gradient Hacking
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Multipolar Scenarios
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Power Seeking (AI)
Recursive Self-Improvement
Simulator Theory
Sharp Left Turn
Solomonoff Induction
Superintelligence
Symbol Grounding
Transformative AI
Treacherous Turn
Utility Functions
Whole Brain Emulation

Engineering Alignment

Agent Foundations
AI-assisted Alignment
AI Boxing (Containment)
Conservatism (AI)
Debate (AI safety technique)
Eliciting Latent Knowledge (ELK)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
RLHF
Shard Theory
Tool AI
Transparency / Interpretability
Tripwire
Value Learning

Organizations

AI Safety Camp
Alignment Research Center
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MIRI
OpenAI
Ought
SERI MATS

Strategy

AI Alignment Fieldbuilding
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Safety Public Materials
AI Services (CAIS)
AI Success Models
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Restrain AI Development

Other

AI Alignment Intro Materials
AI Capabilities
AI Questions Open Thread
Compute
DALL-E
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas

Discuss this tag (0)

plex v2.27.0Dec 25th 2022 (+659/-223) LW2

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Deceptive Alignment
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart's Law
Goal-Directedness
Gradient Hacking
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Multipolar Scenarios
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Power Seeking (AI)
Recursive Self-Improvement
Simulator Theory
Sharp Left Turn
Solomonoff Induction
Superintelligence
Symbol Grounding
Transformative AI
Treacherous Turn
Utility Functions
Whole Brain Emulation

Engineering Alignment

Agent Foundations
AI-assisted Alignment
AI Boxing (Containment)
Conservatism (AI)
Debate (AI safety technique)
Eliciting Latent Knowledge (ELK)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
RLHF
Shard Theory
Tool AI
Transparency / Interpretability
Tripwire
Value Learning

Organizations

AI Safety Camp
Alignment Research Center
Anthropic
Apart Research
AXRP
CHAI (UC Berkeley)
Conjecture (org)
DeepMind
FHI (Oxford)
Future of Life Institute
MIRI
OpenAI
Ought
SERI MATS

Strategy

AI Alignment Fieldbuilding
AI Governance
AI Persuasion
AI Risk
AI Risk Concrete Stories
AI Safety Public Materials
AI Services (CAIS)
AI Success Models
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
~~Transformative~~Restrain AI Development

~~Organizations~~

~~AI Safety Camp~~
~~Centre for Human-Compatible AI~~
~~DeepMind~~
~~Future of Humanity Institute~~
~~Future of Life Institute~~
~~Machine Intelligence Research Institute~~
~~OpenAI~~
~~Ought~~

Other

AI Alignment Intro Materials
AI Capabilities
AI Questions Open Thread
Compute
DALL-E
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Prompt Engineering
Reinforcement Learning
Research Agendas
~~Superintelligence~~
~~Whole Brain Emulation~~

Discuss this tag (0)

RobertM v2.26.0Oct 17th 2022 revert name change LW1

Discuss this tag (0)

Ruben Bloom v2.25.0Oct 17th 2022 (+90/-133) LW2

Artificial Intelligence is the study of creating intelligence in algorithms. ~~On LessWrong,~~AI Alignment is the ~~primary focus~~task of ensuring [powerful] AI ~~discussion is to ensure that as humanity builds increasingly powerful AI systems, the outcome will be good.~~system are aligned with human values and interests. The central concern is that a powerful enough AI, if not designed and implemented with sufficient understanding, would optimize something unintended by its creators and pose an existential threat to the future of humanity. This is known as the AI alignment problem.

Discuss this tag (0)

plex v2.24.0Oct 4th 2021 (-14) removed two tags at Ruby's suggestion LW1

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart's Law
Goal-Directedness
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Recursive Self-Improvement
Solomonoff Induction
Treacherous Turn
Utility Functions
~~Wireheading~~

Engineering Alignment

AI Boxing (Containment)
Conservatism (AI)
Debate (AI safety technique)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
Tool AI
Transparency / Interpretability
Tripwire
Value Learning

Strategy

AI Governance
AI Risk
AI Services (CAIS)
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Transformative AI

Organizations

AI Safety Camp
Centre for Human-Compatible AI
DeepMind
Future of Humanity Institute
Future of Life Institute
Machine Intelligence Research Institute
OpenAI
Ought

Other

AI Capabilities
~~GAN~~
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Reinforcement Learning
Research Agendas
Superintelligence
Whole Brain Emulation

Discuss this tag (0)

plex v2.23.0Oct 4th 2021 missed two tag counts from my last edit LW1

Discuss this tag (0)

plex v2.22.0Oct 4th 2021 (+24) using ?showPostCount=true&useTagName=true on the newly added tags so the count shows up LW1

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart's Law
Goal-Directedness
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Recursive Self-Improvement
Solomonoff Induction
Treacherous Turn
Utility Functions
Wireheading

Engineering Alignment

AI Boxing (Containment)
Conservatism (AI)
Debate (AI safety technique)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
Tool AI
Transparency / Interpretability
Tripwire
Value Learning

Strategy

AI Governance
AI Risk
AI Services (CAIS)
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Transformative AI

Organizations

AI Safety Camp
Centre for Human-Compatible AI
DeepMind
Future of Humanity Institute
Future of Life Institute
Machine Intelligence Research Institute
OpenAI
Ought

Other

AI Capabilities
GAN
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Reinforcement Learning
Research Agendas
Superintelligence
Whole Brain Emulation

Discuss this tag (0)

plex v2.21.0Oct 2nd 2021 (+365) Adding a lot of tags to match the portal LW1

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart's Law
Goal-Directedness
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Intelligence Explosion
Logical Induction
Logical Uncertainty
Mesa-Optimization
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Recursive Self-Improvement
Solomonoff Induction
Treacherous Turn
Utility Functions
Wireheading

Engineering Alignment

AI Boxing (Containment)
Conservatism (AI)
Debate (AI safety technique)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Oracle AI
Reward Functions
Tool AI
Transparency / Interpretability
Tripwire
Value Learning

Strategy

AI Governance
AI Risk
AI Services (CAIS)
AI Takeoff
AI Timelines
Computing Overhang
Regulation and AI Risk
Transformative AI

Organizations

AI Safety Camp
Centre for Human-Compatible AI
DeepMind
Future of Humanity Institute
Machine Intelligence Research Institute
OpenAI
Ought

Other

AI Capabilities
GAN
GPT
Language Models
Machine Learning
Narrow AI
Neuromorphic AI
Reinforcement Learning
Research Agendas
Superintelligence
Whole Brain Emulation

Discuss this tag (0)

plex v2.20.0Mar 11th 2021 fixing a link LW1

Discuss this tag (0)

plex v2.19.0Mar 11th 2021 (+232/-16) Updated a bunch of tags in the table LW1

Basic Alignment Theory

AIXI
Coherent Extrapolated Volition
Complexity of Value
Corrigibility
Decision Theory
Embedded Agency
Fixed Point Theorems
Goodhart's Law
Infra-Bayesianism
Inner Alignment
Instrumental Convergence
Logical Induction
Mesa-Optimization
Myopia
Newcomb's Problem
Optimization
Orthogonality Thesis
Outer Alignment
Paperclip Maximizer
Solomonoff Induction
Utility Functions

Engineering Alignment

AI Boxing (Containment)
Debate (AI safety technique)
Factored Cognition
Humans Consulting HCH
Impact Measures
Inverse Reinforcement Learning
Iterated Amplification
Mild Optimization
Tool AI
Transparency / Interpretability
Value Learning

Strategy

AI ~~Progress~~Governance
AI Risk
AI Services (CAIS)
AI Takeoff
AI Timelines

~~Other~~Organizations

Centre for Human-Compatible AI
DeepMind
Future of Humanity Institute
~~GPT~~
Machine Intelligence Research Institute
OpenAI
Ought

Other
GPT
Research Agendas

Discuss this tag (0)

Load More (10/43)