Marius Hobbhahn

I'm currently doing a Ph.D. in ML at the International Max-Planck research school in Tübingen. I also do independent research. I'm currently excited about an agenda that is roughly "do the kind of mechanistic interpretability that will get us closer to detecting deceptive behavior in future models". If you think I should work for you, please reach out.

For more see

I subscribe to Crocker's Rules

Wiki Contributions


I don't think there is a general answer here. But here are a couple of considerations:
- networks can get stuck in local optima, so if you initialize it to memorize, it might never find a general solution.
- grokking has shown that with high weight regularization, networks can transition from memorized to general solutions, so it is possible to move from one to the other.
- it probably depends a bit on how exactly you initialize the memorized solution. You can represent lookup tables in different ways and some are much more liked by NNs than others. For example, I found that networks really don't like it if you set the weights to one-hot vectors such that one input only maps to one feature.
- My prediction for empirical experiments here would be something like "it might work in some cases but not be clearly better in the general case. It will also depend on a lot of annoying factors like weight decay and learning rate and the exact way you build the dictionary".