This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
AF
Login
Wikitags
Alignment Jam
Edited by
Esben Kran
last updated
16th May 2023
This lists the posts that have come from the
Alignment Jam hackathons
.
Subscribe
Subscribe
Discussion
0
Discussion
0
Posts tagged
Alignment Jam
Most Relevant
53
We Found An Neuron in GPT-2
Joseph Miller
,
Clement Neo
3y
0
52
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
StefanHex
,
Marius Hobbhahn
2y
0
38
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
StefanHex
,
Marius Hobbhahn
2y
0
14
Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
,
viluon
2y
2
13
How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
StefanHex
3y
0
7
Finding Deception in Language Models
Esben Kran
,
Archana Vaidheeswaran
1y
0