AI ALIGNMENT FORUM
Wikitags
AF

Subscribe
Discussion0
1

Anthropic (org)

Subscribe
Discussion0
1
Written by Dakara, Multicore, Ruben Bloom, Ryan Greenblatt last updated 31st Dec 2024

Anthropic is an AI company based in San Francisco. The company is known for developing the Claude AI family and publishing research on AI safety.

Not to be confused with anthropics.

Posts tagged Anthropic (org)
4
63Anthropic's Core Views on AI Safety
Zac Hatfield-Dodds
2y
13
1
48Why I'm joining Anthropic
Evan Hubinger
2y
2
3
33Toy Models of Superposition
Evan Hubinger
3y
2
1
33Concrete Reasons for Hope about AI
Zac Hatfield-Dodds
2y
0
2
67Transformer Circuits
Evan Hubinger
3y
3
1
110Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds
2y
12
1
95Introducing Alignment Stress-Testing at Anthropic
Evan Hubinger
1y
19
1
69EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
Stephen Casper
1y
7
2
42Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman
6mo
18
1
46Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds
2y
0
1
44Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
Zac Hatfield-Dodds
2y
6
1
30Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
Lawrence Chan
2y
0
1
25Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds
7mo
0
1
30Putting up Bumpers
Sam Bowman
21d
6
1
18How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?
Owain Evans
3y
1
Load More (15/21)
Add Posts