Dai

A voice for the unknown human who hides within the binary.

Posts

Sorted by New

Wiki Contributions

Comments

Reflections on Larks’ 2020 AI alignment literature review

I agree with the thesis but suspect a slightly different mechanism. I don't think people have trouble ignoring noise at an epistemic level - I think people have other reasons for paying lip-service to genre-aligned content independent of epistemic content, and so noise creates a conflict of interest.

This suggests another possible approach in sharpening the incentives to preferentially reward epistemic content - rather than flat out reducing variance which has other negative side effects (eg fragility).

Do Sufficiently Advanced Agents Use Logic?

In the opening section, I think it's informative to distinguish representations for what it means to "use logic" in a strategy.

1. The case I expect people usually mean: strategies are parameterized by things that look like "models" and process data that behaves like "propositions", possibly generating some concrete predictions on the base game.

2. Strategies implicitly implement models, by virtue of their algorithmic trace uniquely factoring through a (logically subsequent to it) model.

The latter being possible even for "Model Free" RL-agents.

But I argue (2) is weaker than (1), with the essential capability of (1) being quantification over models: The fact that strategies are explicitly parameterized by models means they can be replugged with internalized propositions in a way baked-in (2) models can't.

So in light of this distinction, I think using (2), the answer is strongly "yes" but (1) it's less clear, and often limited by overfitting (not being iterated enough).

I suspect a sort of 'cut elimination' principle whereby no concrete task actually requires the full reflective capabilities of (1), but that it has better (space) complexity, stability, and adaptability properties and so is favored in existential scenarios where the reward criteria contains a lot of logical uncertainty.