The dataset and model suite TinyStories is widely used as a toy setup for mech interp research. Searching for the term on this forum currently yields 23 results. "A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team" lists making an improved version of it as a desideratum, because the dataset is "very formulaic, small, and has unusual unicode characters in it".
The dataset and model suite TinyStories is widely used as a toy setup for mech interp research. Searching for the term on this forum currently yields 23 results. "A List of 45+ Mech Interp Project Ideas from Apollo Research’s Interpretability Team" lists making an improved version of it as a desideratum, because the dataset is "very formulaic, small, and has unusual unicode characters in it".
The improved version, the SimpleStories dataset and model suite, is now out. We are looking forward to see what the interp community ends up doing with it, and hope that it is better than TinyStories in all interp use cases. For feedback, we keep a community doc.