romeostevensit - AI Alignment Forum

Is there a short summary on the rejecting Knightian uncertainty bit?

Value systematization: how values become coherent (and misaligned)

Sample complexity reduction is one of our main moment to moment activities, but humans seem to apply it across bigger bridges and this is probably part of transfer learning. One of the things we can apply sample complexity reduction to is the 'self' object, the idea of a coherent agent across differing decision points. The tradeoff between local and global loss seems to regulate this. Humans don't seem uniform on this dimension, foxes care more about local loss, hedgehogs more about global loss. Most moral philosophy seem like appeals to different possible high order symmetries. I don't think this is the crux of the issue, as I think human compressions of these things will turn out to be pretty easy to do with tons of cognitive horsepower, the dimensionality of our value embedding is probably not that high. My guess is the crux is getting a system to care about distress in the first place, and then balance local and global distress.

Meta Questions about Metaphilosophy

romeostevensit1y10

Also I wrote this a while back https://www.lesswrong.com/posts/caSv2sqB2bgMybvr9/exploring-tacit-linked-premises-with-gpt

Meta Questions about Metaphilosophy

romeostevensit1y30

When I look at metaphilosophy, the main places I go looking are places with large confusion deltas. Where, who, and why did someone become dramatically less philosophically confused about something, turning unfalsifiable questions into technical problems. Kuhn was too caught up in the social dynamics to want to do this from the perspective of pure ideas. A few things to point to.

Wittgenstein noticed that many philosophical problems attempt to intervene at the wrong level of abstraction and posited that awareness of abstraction as a mental event might help
Korzybski noticed that many philosophical problems attempt to intervene at the wrong level of abstraction and posited that awareness of abstraction as a mental event might help
David Marr noticed that many philosophical and technical problems attempt to intervene at the wrong level of you get the idea
Hassabis cites Marr as of help in deconfusing AI problems
Eliezer's Technical Explanation of Technical Explanation doesn't use the term compression and seems the worse for it, using many many words to describe things that compression would render easier to reason about afaict.
Hanson in the Elephant in the Brain posits that if we mysteriously don't make progress on something that seems crucial, maybe we have strong motivations for not making progress on it.

Question: what happens to people when they gain consciousness of abstraction? My first pass attempt at an answer is that they become a lot less interested in philosophy.

Question: if someone had quietly made progress on metaphilosophy how would we know? First guess is that we would only know if their solution scaled well, or caused something to scale well.

The Lightcone Theorem: A Better Foundation For Natural Abstraction?

romeostevensit1y40

Is there a good primer somewhere on how causal models interact with the standard model of physics?

Shell games

romeostevensit1y30

Aether variables

Natural Abstractions: Key claims, Theorems, and Critiques

romeostevensit1y10

Tangentially related: recent discussion raising a seemingly surprising point about LLM's being lossless compression finders https://www.youtube.com/watch?v=dO4TPJkeaaU

Can we efficiently distinguish different mechanisms?

romeostevensit2y20

The first intuition pump that comes to mind for distinguishing mechanisms is examining how my brain generates and assigns credence to the hypothesis that something going wrong with my car is a sensor malfunction vs telling me about a problem in the world that the sensor exists to alert me to.

One thing that happens is that the broken sensor implies a much larger space of worlds because it can vary arbitrarily instead of only in tight informational coupling with the underlying physical system. So fluctuations outside the historical behavior of the sensor either implies I'm in some sort of weird environment or that the sensor is varying with something besides what it is supposed to measure, a hidden variable if coherent or noisy if random. So the detection is tied to why it is desirable to goodhart the sensor in the first place, more option value by allowing consistency with a broader range of worlds. By the same token, the hypothesis "the sensor is broken" should be harder to falsify since the hypothesis is consistent with lots of data? The first thing it occurs to me to do is supply a controlled input to see if I get a controlled output (see: calibrating a scale by using a known weight). This suggests that complex sensors that couple with the environment along more dimensions are harder to fool, though any data bottlenecks that are passed through reduce this i.e. the human reviewing things is themselves using a learnable simple routine that exhibits low coupling.

The next intuition pump, imagine there are two mechanics. One makes a lot of money from replacing sensors, they're fast at it and get the sensors for a discount by buying in bulk. The second mechanic makes a lot of money by doing a lot of really complicated testing and work. They work on fewer cars but the revenue per car is high. Each is unscrupulous and will lie that your problem is the one they are good at fixing. I try to imagine the sorts of things they would tell me to convince me the problem is really the sensor vs the problem is really out in the world. This even suggests a three player game that might generate additional ideas.

The "Minimal Latents" Approach to Natural Abstractions

romeostevensit2y30

Related background on the philosophical problem: gavagai

Niceness is unnatural

romeostevensit2y3-1

Whether or not details (and lots of specific detail arguments) matter hinges on the sensitivity argument (which is an argument about basins?) in general, so I'd like to see that addressed directly. What are the arguments for high sensitivity worlds other than anthropics? What is the detailed anthropic argument?

AI ALIGNMENT FORUM
AF

Posts

Wiki Contributions

Comments