AI ALIGNMENT FORUM
AF

154
Yulu Pi
Ω2000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
No wikitag contributions to display.
EIS VI: Critiques of Mechanistic Interpretability Work in AI Safety
Yulu Pi3y20

I have been wondering if neural networks (or more specifically, transformers) will become the ultimate form of AGI. If not, will the existing research on Interpretability, become obsolete?

Reply
A Barebones Guide to Mechanistic Interpretability Prerequisites
Yulu Pi3y00

hey Neel,

Great post!

I am trying to look into the code here

  • Good (but hard) exercise: Code your own tiny GPT-2 and train it. If you can do this, I’d say that you basically fully understand the transformer architecture.
    • Example of basic training boilerplate and train script
    • The EasyTransformer codebase is probably good to riff off of here

But the links dont work anymore! It would be nice if you could help update them!

I dont know if this link works for the original content: https://colab.research.google.com/github/neelnanda-io/Easy-Transformer/blob/clean-transformer-demo/Clean_Transformer_Demo_Template.ipynb

 

Thanks a lot!

Reply