With LLMs, reasoning is becoming composable, so standard libraries of pen tests/abstraction decomposition for eg type errors etc could become usable, testable, improvable etc.
Found this interesting and useful. Big update for me is that 'I cut you choose' is basically the property that most (all?) good self therapy modalities use afaict. In that the part or part-coalition running the therapy procedure can offer but not force things, since its frames are subtly biasing the process.
For people who want weirder takes I would recommend Egan's unstable orbits in the space of lies.
People inexplicably seem to favor extremely bad leaders-->people seem to inexplicably favor bad AIs.
You mention 'warp' when talking about cross ontology mapping which seems like your best summary of a complicated intuition. I'd be curious to hear more (I recognize this might not be practical). My own intuition surfaced 'introducing degrees of freedom' a la indeterminacy of translation.
Is there a short summary on the rejecting Knightian uncertainty bit?
Sample complexity reduction is one of our main moment to moment activities, but humans seem to apply it across bigger bridges and this is probably part of transfer learning. One of the things we can apply sample complexity reduction to is the 'self' object, the idea of a coherent agent across differing decision points. The tradeoff between local and global loss seems to regulate this. Humans don't seem uniform on this dimension, foxes care more about local loss, hedgehogs more about global loss. Most moral philosophy seem like appeals to different possible high order symmetries. I don't think this is the crux of the issue, as I think human compressions of these things will turn out to be pretty easy to do with tons of cognitive horsepower, the dimensionality of our value embedding is probably not that high. My guess is the crux is getting a system to care about distress in the first place, and then balance local and global distress.
Also I wrote this a while back https://www.lesswrong.com/posts/caSv2sqB2bgMybvr9/exploring-tacit-linked-premises-with-gpt
The bit about layering creating functional fixedness reminds me of organisms (especially humans, but more broadly evolution as a search process) as 'homeostatic envelope extenders' a la Nozick's take on Quine.