## AI ALIGNMENT FORUMAF

Lucius Bushnaq

AI Safety researcher, physics PhD student

Sorted by New

# Wiki Contributions

These manifolds generally extend out to infinity, so it isn't really meaningful to talk about literal "basin volume".[4]  We can focus instead on their dimensionality.

Once you take priors over the parameters into account, I would not expect this to continue holding. I'd guess that if you want to get the volume of regions in which the loss is close to the perfect loss, directions that are not flat are going to matter a lot. Whether a given non-flat direction is incredibly steep, or half the width given by the prior could make a huge difference.

I still think the information loss framework could make sense however. I'd guess that there should be a more general relation where the less information there is to distinguish different data points, the more e.g. principal directions in the Hessian of the loss function will tend to be broad.

I'd also be interested in seeing what happens if you look at cases with non-zero/non-perfect loss. That should give you second order terms in the network output, but these again look to me like they'd tend to give you broader principal directions if you have less information exchange in the network. For example, a modular network might have low-dimensional off-diagonals, which you can show with the Schur complement is equivalent to having sparse off-diagonals, which I think would give you less extreme eigenvalues.

I know we've discussed these points before, but I thought I'd repeat them here where people can see them.

A very good point!

I agree that fix 1. seems bad, and doesn't capture what we care about.

At first glance, fix 2. seems more promising to me, but I'll need to think about it.

Thank you very much for pointing this out.