AI ALIGNMENT FORUM
AF

64
Wikitags

Apart Research

Edited by Esben Kran, habryka, Jason Hoelscher-Obermaier last updated 18th Jul 2024

Apart Research is an AI safety research lab. They host the Apart Sprints, large-scale international events for research experimentation. This tag includes posts written by Apart researchers and content about Apart Research.

Subscribe
Discussion
1
Subscribe
Discussion
1
Posts tagged Apart Research
53We Found An Neuron in GPT-2
Joseph Miller, Clement Neo
3y
0
1Analysing Adversarial Attacks with Linear Probing
Yoann Poupart, Imene Kerboua, Clement Neo, Jason Hoelscher-Obermaier
1y
0
52Solving the Mechanistic Interpretability challenges: EIS VII Challenge 1
StefanHex, Marius Hobbhahn
2y
0
38Solving the Mechanistic Interpretability challenges: EIS VII Challenge 2
StefanHex, Marius Hobbhahn
2y
0
14Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen, viluon
2y
2
13How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
StefanHex
3y
0
7Finding Deception in Language Models
Esben Kran, Archana Vaidheeswaran
1y
0
4Approximating Human Preferences Using a Multi-Judge Learned System
JoseFaustino, eitan sprejer, Fernando Avalos, Augusto Bernardi
2mo
0
3Can startups be impactful in AI safety?
Esben Kran, Archana Vaidheeswaran
1y
0
Add Posts