UPDATE: The deadline for applying for this is 11:59pm PT Sat 27 Aug. I'll be selecting candidates following a 2 week (paid) work trial, 10hr/week starting Sept 12 where participants pick a concrete research idea and try to make progress on it.
Hey! My name is Neel Nanda. I used to work at Anthropic on LLM interpretability under Chris Olah (the Transformer Circuits agenda), and am currently doing some independent mechanistic interpretability work, most recently on grokking (see summary). I have a bunch of concrete project ideas, and am looking to hire an RA/intern to help me work on some of them!
Role details: The role would be remote by default. Full-time (~40 hrs/week). Roughly for the next 2-3 months, but flexible-ish. I can pay you $50/hr (via a grant).
What I can offer: I can offer concrete project ideas, help getting started, and about 1hr/week of ongoing mentorship. (Ideally more, but that's all I feel confident committing to) I'm much better placed to offer high-level research mentorship/guidance than ML engineering guidance.
Pre-requisites: As such, I'd be looking for someone with existing ML skill, enough to be able to write and run simple experiments, esp with transformers. (e.g. writing your own GPT-2-style transformer from scratch in PyTorch is more than enough). Someone who’s fairly independently minded and could mostly work independently if given a concrete project idea, and mostly needs help with high-level direction. Familiarity with transformer circuits, and good linear algebra intuitions are nice-to-haves, but not essential. I expect this is best suited to someone who wants to test fit for doing alignment work, and is particularly interested in mechanistic interpretability.
Project ideas: Two main categories.
Actions: If you're interested in potentially working with me please reach out at firstname.lastname@example.org. My ideal email would include: