Sample complexity reduction is one of our main moment to moment activities, but humans seem to apply it across bigger bridges and this is probably part of transfer learning. One of the things we can apply sample complexity reduction to is the 'self' object, the idea of a coherent agent across differing decision points. The tradeoff between local and global loss seems to regulate this. Humans don't seem uniform on this dimension, foxes care more about local loss, hedgehogs more about global loss. Most moral philosophy seem like appeals to different possible high order symmetries. I don't think this is the crux of the issue, as I think human compressions of these things will turn out to be pretty easy to do with tons of cognitive horsepower, the dimensionality of our value embedding is probably not that high. My guess is the crux is getting a system to care about distress in the first place, and then balance local and global distress.
Also I wrote this a while back
When I look at metaphilosophy, the main places I go looking are places with large confusion deltas. Where, who, and why did someone become dramatically less philosophically confused about something, turning unfalsifiable questions into technical problems. Kuhn was too caught up in the social dynamics to want to do this from the perspective of pure ideas. A few things to point to.
Question: what happens to people when they gain consciousness of abstraction? My first pass attempt at an answer is that they become a lot less interested in philosophy.
Question: if someone had quietly made progress on metaphilosophy how would we know? First guess is that we would only know if their solution scaled well, or caused something to scale well.
Is there a good primer somewhere on how causal models interact with the standard model of physics?
Tangentially related: recent discussion raising a seemingly surprising point about LLM's being lossless compression finders
The first intuition pump that comes to mind for distinguishing mechanisms is examining how my brain generates and assigns credence to the hypothesis that something going wrong with my car is a sensor malfunction vs telling me about a problem in the world that the sensor exists to alert me to.
One thing that happens is that the broken sensor implies a much larger space of worlds because it can vary arbitrarily instead of only in tight informational coupling with the underlying physical system. So fluctuations outside the historical behavior of the sensor either implies I'm in some sort of weird environment or that the sensor is varying with something besides what it is supposed to measure, a hidden variable if coherent or noisy if random. So the detection is tied to why it is desirable to goodhart the sensor in the first place, more option value by allowing consistency with a broader range of worlds. By the same token, the hypothesis "the sensor is broken" should be harder to falsify since the hypothesis is consistent with lots of data? The first thing it occurs to me to do is supply a controlled input to see if I get a controlled output (see: calibrating a scale by using a known weight). This suggests that complex sensors that couple with the environment along more dimensions are harder to fool, though any data bottlenecks that are passed through reduce this i.e. the human reviewing things is themselves using a learnable simple routine that exhibits low coupling.
The next intuition pump, imagine there are two mechanics. One makes a lot of money from replacing sensors, they're fast at it and get the sensors for a discount by buying in bulk. The second mechanic makes a lot of money by doing a lot of really complicated testing and work. They work on fewer cars but the revenue per car is high. Each is unscrupulous and will lie that your problem is the one they are good at fixing. I try to imagine the sorts of things they would tell me to convince me the problem is really the sensor vs the problem is really out in the world. This even suggests a three player game that might generate additional ideas.
Related background on the philosophical problem: gavagai
Whether or not details (and lots of specific detail arguments) matter hinges on the sensitivity argument (which is an argument about basins?) in general, so I'd like to see that addressed directly. What are the arguments for high sensitivity worlds other than anthropics? What is the detailed anthropic argument?
Rambling/riffing: Boundaries typically need holes in order to be useful. Depending on the level of abstraction, different things can be thought of as holes. One way to think of a boundary is a place where a rule is enforced consistently, and this probably involves pushing what would be a continuous condition into a condition with a few semi discrete modes (in the simplest case enforcing a bimodal distribution of outcomes). In practice, living systems seem to have settled on stacking a bunch of one dimensional gate keepers together as presumably the modularity of such a thing was easier to discover in the search space than things with higher path dependencies due to entangled condition measurement. This highlights the similarity between boolean circuit analysis and a biological boundary. In a boolean circuit, the configurations of 'cheap' energy flows/gradients can be optimized for benefit, while the walls to the vast alternative space of other configurations can be artificially steepened/shored up (see: mitigation efforts to prevent electron tunneling in semiconductors).