Joseph Isaac Bloom

I'm an independently funded AI Alignment Research Engineer focussing on mechanistic interpretability in reinforcement learning. I'm particularly interested in comparing circuits in decision transformers to those generated by other techniques. 

Wiki Contributions

Comments

Thank you for letting me know about your work on procgen with MI. It sounds like you're making progress, particularly I'd be interested in your visualisation techniques (how do they compare to what was done in Understanding RL Vision?) and the reproduction of the cheese-maze policies (is this tricky? Do you think a DT could be well-calibrated on this problem?). 

Some questions that might be useful to discuss more:

  • What are the pros/cons of doing DT vs actor-critic MI? (You're using Actor-Critic of some form?). It could also be interesting to study analogous circuits in the DT vs AC scenarios. 
  • I haven't done anything with CNNs yet, for simplicity, but I might be able to calibrate my expectations on the value/challenges involved by chatting to the team shard MATS stream. 

Glad to hear your progress is going well! I'll be in the Bay Area for EAG if anyone from the team would like to chat. 

Hey Adam, thanks for running Refine and writing this up. 

Out of curiosity, do you (or anyone else) know if there are statistics for previous SERI-MATS cohorts/other programs designed to generate conceptual alignment researchers?