AI ALIGNMENT FORUM
Wikitags
AF

Subscribe
Discussion0
1

Anthropic (org)

Subscribe
Discussion0
1
Written by Dakara, Multicore, Ruben Bloom, Ryan Greenblatt last updated 31st Dec 2024

Anthropic is an AI company based in San Francisco. The company is known for developing the Claude AI family and publishing research on AI safety.

Not to be confused with anthropics.

Posts tagged Anthropic (org)
63Anthropic's Core Views on AI Safety
Zac Hatfield-Dodds
2y
13
49Why I'm joining Anthropic
Evan Hubinger
2y
2
33Toy Models of Superposition
Evan Hubinger
3y
2
29Concrete Reasons for Hope about AI
Zac Hatfield-Dodds
2y
0
67Transformer Circuits
Evan Hubinger
4y
3
110Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds
2y
12
95Introducing Alignment Stress-Testing at Anthropic
Evan Hubinger
1y
19
69EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
Stephen Casper
1y
7
42Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman
8mo
18
45Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
Zac Hatfield-Dodds
2y
6
45Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds
2y
0
30Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
Lawrence Chan
2y
0
30Putting up Bumpers
Sam Bowman
2mo
6
18How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?
Owain Evans
3y
1
19Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds
8mo
0
Load More (15/21)
Add Posts