This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Anthropic (org)
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Anthropic (org)
Random Tag
Contributors
2
Ruben Bloom
0
Multicore
Anthropic
is an AI organization.
Not to be confused with
anthropics
.
Posts tagged
Anthropic (org)
Most Relevant
4
69
Anthropic's Core Views on AI Safety
Zac Hatfield-Dodds
9mo
13
1
52
Why I'm joining Anthropic
Evan Hubinger
1y
2
3
33
Toy Models of Superposition
Evan Hubinger
1y
2
1
37
Concrete Reasons for Hope about AI
Zac Hatfield-Dodds
1y
0
2
67
Transformer Circuits
Evan Hubinger
2y
3
1
107
Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds
1mo
10
1
48
Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
Zac Hatfield-Dodds
3mo
4
1
44
Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds
1mo
0
1
30
Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
Lawrence Chan
10mo
0
1
18
How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?
Owain Evans
2y
1
1
15
Frontier Model Security
Matthew "Vaniver" Gray
4mo
1
-1
78
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
,
Eliezer Yudkowsky
1y
17
1
56
Comparing Anthropic's Dictionary Learning to Ours
Robert_AIZI
2mo
1
0
58
Measuring and Improving the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan
,
tamera
,
karinanguyen
,
Sam Bowman
,
Ethan Perez
5mo
12
0
9
Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation
Soroush Pour
,
rusheb
,
Quentin Feuillade--Montixi
,
Arush Tagade
,
Stephen Casper
1mo
1