AI ALIGNMENT FORUM
AF

Wikitags

Alignment Jam

Edited by Esben Kran last updated 16th May 2023

This lists the posts that have come from the Alignment Jam hackathons.

Subscribe
Subscribe
Discussion0
Discussion0
Posts tagged Alignment Jam
53We Found An Neuron in GPT-2
Joseph Miller, Clement Neo
3y
0
52Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
StefanHex, Marius Hobbhahn
2y
0
38Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
StefanHex, Marius Hobbhahn
2y
0
14Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen, viluon
2y
2
13How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
StefanHex
3y
0
7Finding Deception in Language Models
Esben Kran, Archana Vaidheeswaran
1y
0
Add Posts