AI ALIGNMENT FORUM
AF

Wikitags

Anthropic (org)

Edited by Dakara, Multicore, Ruby, ryan_greenblatt last updated 31st Dec 2024

Anthropic is an AI company based in San Francisco. The company is known for developing the Claude AI family and publishing research on AI safety.

Not to be confused with anthropics.

Subscribe
1
Subscribe
1
Discussion0
Discussion0
Posts tagged Anthropic (org)
63Anthropic's Core Views on AI Safety
Zac Hatfield-Dodds
2y
13
49Why I'm joining Anthropic
evhub
3y
2
33Toy Models of Superposition
evhub
3y
2
29Concrete Reasons for Hope about AI
Zac Hatfield-Dodds
3y
0
67Transformer Circuits
evhub
4y
3
110Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
Zac Hatfield-Dodds
2y
12
95Introducing Alignment Stress-Testing at Anthropic
evhub
2y
19
69EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper
1y
7
42Anthropic: Three Sketches of ASL-4 Safety Case Components
Zach Stein-Perlman
10mo
18
45Anthropic's Responsible Scaling Policy & Long-Term Benefit Trust
Zac Hatfield-Dodds
2y
6
45Dario Amodei’s prepared remarks from the UK AI Safety Summit, on Anthropic’s Responsible Scaling Policy
Zac Hatfield-Dodds
2y
0
30Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
LawrenceC
3y
0
30Putting up Bumpers
Sam Bowman
4mo
7
18How do new models from OpenAI, DeepMind and Anthropic perform on TruthfulQA?
Owain_Evans
4y
1
19Anthropic's updated Responsible Scaling Policy
Zac Hatfield-Dodds
11mo
0
Load More (15/21)
Add Posts