This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Outer Alignment
•
Applied to
Inverse Scaling Prize: Second Round Winners
by
Andrew Gritsevskiy
at
7d
•
Applied to
Some of my disagreements with List of Lethalities
by
Alex Turner
at
13d
•
Applied to
The Alignment Problems
by
Martín Soto
at
18d
•
Applied to
Categorizing failures as “outer” or “inner” misalignment is often confused
by
Raymond Arnold
at
25d
•
Applied to
Causal representation learning as a technique to prevent goal misgeneralization
by
Pablo Antonio Moreno Casares
at
1mo
•
Applied to
On the Importance of Open Sourcing Reward Models
by
elandgre
at
1mo
•
Applied to
Will research in AI risk jinx it? Consequences of training AI on AI risk arguments
by
Yann Dubois
at
1mo
•
Applied to
Paper: Constitutional AI: Harmlessness from AI Feedback (Anthropic)
by
Lawrence Chan
at
1mo
•
Applied to
Disentangling Shard Theory into Atomic Claims
by
Leon Lang
at
2mo
•
Applied to
Alignment with argument-networks and assessment-predictions
by
Tor Økland Barstad
at
2mo
•
Applied to
Inner and outer alignment decompose one hard problem into two extremely hard problems
by
Alex Turner
at
2mo
•
Applied to
Alignment allows "nonrobust" decision-influences and doesn't require robust grading
by
Alex Turner
at
2mo
•
Applied to
Don't align agents to evaluations of plans
by
Alex Turner
at
2mo
•
Applied to
[Hebbian Natural Abstractions] Introduction
by
Samuel Nellessen
at
2mo
•
Applied to
The Disastrously Confident And Inaccurate AI
by
Sharat Jacob Jacob
at
2mo
•
Applied to
A first success story for Outer Alignment: InstructGPT
by
Noosphere89
at
3mo
•
Applied to
Don't you think RLHF solves outer alignment?
by
Noosphere89
at
3mo
•
Applied to
If you’re very optimistic about ELK then you should be optimistic about outer alignment
by
Noosphere89
at
3mo
•
Applied to
Questions about Value Lock-in, Paternalism, and Empowerment
by
Sam
at
3mo