This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Tags
AF
Login
Alignment Jam
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Edit
History
Subscribe
Discussion
(0)
Help improve this page
Alignment Jam
Random Tag
Contributors
1
Esben Kran
This lists the posts that have come from the
Alignment Jam hackathons
.
Posts tagged
Alignment Jam
Most Relevant
0
53
We Found An Neuron in GPT-2
Joseph Miller
,
Clement Neo
2y
0
0
52
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
Stefan Heimersheim
,
Marius Hobbhahn
1y
0
1
38
Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
Stefan Heimersheim
,
Marius Hobbhahn
1y
0
0
13
How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
Stefan Heimersheim
2y
0
0
12
Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
,
viluon
1y
2
0
7
Finding Deception in Language Models
Esben Kran
,
Archana Vaidheeswaran
3mo
0