AI ALIGNMENT FORUM
AF

866
ojorgensen
Ω1001
Message
Dialogue
Subscribe

AI Safety Researcher, my website is here.

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
1ojorgensen's Shortform
2y
0
EIS XI: Moving Forward
ojorgensen3y10

Rando et al. (2022)

This link is broken btw!

Reply
Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]
ojorgensen3y41

This seems very similar to recent work that has come out of the Stanford AI Lab recently, linked to here.

Reply
A Walkthrough of A Mathematical Framework for Transformer Circuits
ojorgensen3y20

I went through the paper for a reading group the other day, and I think the video really helped me to understand what is going on in the paper. Parts I found most useful were indications which parts of the paper / maths were most important to be able to understand, and which were not (tensor products).

I had made some effort to read the paper before with little success, but now feel like I understand the overall results of the paper pretty well. I’m very positive about this video, and similar things like this being made in the future!

Personal context: I also found the intro to IB video series similarly useful. I’m an AI masters student who has some pre-existing knowledge about AI alignment. I have a maths background.

Reply
Distributional Shifts
3 years ago
(+4/-4)