Charlie Steiner

LW1.0 username Manfred. Day job is condensed matter physics, hobby is thinking I know how to assign anthropic probabilities.

Problems Involving Abstraction?

When do we learn abstractions bottom-up (like identifying regularities in sense data) versus top-down (like using a controlled approximation to a theory that you can prove will converge to the right answer)? What are the similarities between what you get out at the end?

Knowledge, manipulation, and free will

I agree. The important part of cases 5 & 6, where some other agent "manipulates" Petrov, is that suddenly, to us human readers, it seems like the protagonist of the story (and we do model it as a story) is the cook/kidnapper, not Petrov.

I'm fine with the AI choosing actions using a model of the world that includes me. I'm not fine with it supplanting me from my agent-shaped place in the story I tell about my life.

[AN #120]: Tracing the intellectual roots of AI and AI alignment

That Hartikainen et al. paper was really interesting! Unfortunately I don't know enough about the state of the art for unsupervised exploration - they compare DDLUS to a 2018 paper (DIAYN), but I'm not sure how either of these compares to other prominent exploration techniques (e.g. something like NGU).

I also wonder if different techniques do better on atari vs. mujoco environments for "unprincipled" reasons that make apples to apples comparisons difficult for techniques developed by different groups.

What to do with imitation humans, other than asking them what the right thing to do is?

In retrospect, I was totally unclear that I wan't necessarily talking about something that has a complicated internal state, such that it can behave like one human over long time scales. I was thinking more about the "minimum human-imitating unit" necessary to get things like IDA off the ground.

In fact this post was originally titled "What to do with a GAN of a human?"

Comparing Utilities

One think I'd also ask about is: what about ecology / iterated games? I'm not very sure at all whether there are relevant iterated games here, so I'm curious what you think.

How about an ecology where there are both people and communities - the communities have different aggregation rules, and the people can join different communities. There's some set of options that are chosen by the communities, but it's the people who actually care about what option gets chosen and choose how to move between communities based on what happens with the options - the communities just choose their aggregation rule to get lots of people to join them.

How can we set up this game so that interesting behavior emerges? Well, people shouldn't just seek out the community that most closely matches their own preferences, because then everyone would fracture into communities of size 1. Instead, there must be some benefit to being in a community. I have two ideas about this: one is that the people could care to some extent about what happens in all communities, so they will join a community if they think they can shift its preferences on the important things while conceding the unimportant things. Another is that there could be some crude advantage to being in a community that looks like a scaling term (monotonically increasing with community size) on how effective they are at satisfying their peoples' preferences.

Egan's Theorem?

Right, it's a little tricky to specify exactly what this "relationship" is. Is the notion that you should be able to compress the approximate model, given an oracle for the code of the best one (i.e. that they share pieces?). Because most Turing machines don't compress well, and so it's easy to find counterexamples (the most straightforward class is where the approximate model is already extremely simple).

Anyhow, like I said, hard to capture the spirit of the problem. But when I *do* try to formalize the problem, it tends to not have the property, which is definitely driving my intuition.

Egan's Theorem?

If by "account for that" you mean not be in direct conflict with earlier sense data, then sure. All tautologies about the data will continue to be true. Suppose some data can be predicted by classical mechanics with 75% accuracy. This is a tautology given the data itself, and no future theory will somehow make classical mechanics stop giving 75% accurate predictions for that past data.

Maybe that's all you meant?

I'd sort of interpreted you as asking questions about properties of the *theory*. E.g. "this data is really well explained by the classical mechanics of point particles, therefore any future theory should have a particularly simple relationship to the point particle ontology." It seems like there shouldn't be a guaranteed relationship that's much simpler than reconstructing the data and recomputing the inferred point particles.

I spent a little while trying to phrase this in terms of Turing machines but I don't think I quite managed to capture the spirit.

Egan's Theorem?

The answer to the question you actually asked is no, there is no ironclad guarantee of properties continuing, nor any guarantee that there will be a simple mapping between theories. With some effort you can construct some perverse Turing machines with bad behavior.

But the answer the more generalized question is yes, simple properties can be expected (in a probabilistic sense) to generalize even if the model is incomplete. This is basically Minimum Message Length prediction, which you can put on the theoretical basis of the Solomonoff prior (It's somewhere in Li and Vitanyi - chapter 5?).

Introduction To The Infra-Bayesianism Sequence

Could you defend worst-case reasoning a little more? Worst cases can be arbitrarily different from the average case - so maybe having worst-case guarantees can be reassuring, but actually choosing policies by explicit reference to the worst case seems suspicious. (In the human context, we might suppose that worst case, I have a stroke in the next few seconds and die. But I'm not in the business of picking policies by how they do in that case.)

You might say "we don't have an average case," but if there are possible hypotheses outside your considered space you don't have the worst case either - the problem of estimating a property of a non-realizable hypothesis space is simplified, but not gone.

Anyhow, still looking forward to working my way through this series :)

If every pair (a,e) led to a different world-state, this would be the boring case of complete factorizability, right? As in, you couldn't distinguish this from the world having no dynamics at all, just a recording of the choices of a and e. Therefore it seems important that your dynamics send some pairs of choices to identical states.

But that's not necessarily how the micro-scale laws of physics work. You can't squish state space irreversibly like that. And so W can't be the actual microphysical world, it has to be some macro-level abstract model of it, or else it's boring.

So I'm a little confused about what you have in mind when you talk about putting different bases A and E onto the same W. What's so great about keeping the same W, if it's an abstraction of the microphysical world, tailor-made to help us model exactly this agent? I suspect that the answer is that you're using this to model an agent that also has subagents, so I'm excited for that post :)