To confirm - the weights you share, such as 0.26 and 0.23 are each individual entries in the W matrix for:y=Wx ?
What city/country is PIBBSS based out of / where will the retreats be? (Asking as a Bay Area American without a valid passport).
This is a surprising and fascinating result. Do you have attention plots of all 144 heads you could share?I'm particularly interested in the patterns for all heads on layers 0 and 1 matching the following caption
(Left: a 50x50 submatrix of LXHY's attention pattern on a prompt from openwebtext-10k. Right: the same submatrix of LXHY's attention pattern, when positional embeddings are averaged as described above.)
My primary safety concern is what happens if one of these analyses somehow leads to a large improvement over the state of the art. I don't know what form this would take and it might be unexpected given the Bitter Lesson you cite above, but if it happens, what do we do then? Given this is hypothetical and the next large improvement in LMs could come elsewhere, I'm not suggesting we stop sharing now. But I think we should be prepared that there might be a point in time where we need to acknowledge such sharing leads to significantly stronger models and thus should re-evaluate sharing such eval work.