This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Anthropic (org)
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Anthropic (org)
Random Tag
Contributors
2
Ruben Bloom
0
Multicore
Anthropic
is an AI organization.
Not to be confused with
anthropics
.
Posts tagged
Anthropic (org)
Most Relevant
4
67
Anthropic's Core Views on AI Safety
Zac Hatfield-Dodds
1y
13
1
50
Why I'm joining Anthropic
Evan Hubinger
1y
2
3
33
Toy Models of Superposition
Evan Hubinger
2y
2
1
34
Concrete Reasons for Hope about AI
Zac Hatfield-Dodds
1y
0
2
67
Transformer Circuits
Evan Hubinger
2y
3
1
110
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds
6mo
11
1
92
Introducing Alignment Stress-Testing at Anthropic
Evan Hubinger
3mo
19
1
47
Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
Zac Hatfield-Dodds
7mo
5
1
46
Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds
6mo
0
1
30
Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
Lawrence Chan
1y
0
1
18
How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?
Owain Evans
2y
1
1
15
Frontier Model Security
Matthew "Vaniver" Gray
9mo
1
-1
78
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
,
Eliezer Yudkowsky
1y
17
1
56
Comparing Anthropic's Dictionary Learning to Ours
Robert_AIZI
6mo
1
0
58
Measuring and Improving the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan
,
tamera
,
karinanguyen
,
Sam Bowman
,
Ethan Perez
9mo
12