Probabilistic Tiling (Preliminary Attempt)

I suggest stating the result you're proving before giving the proof.

You have some unusual notation that I think makes some of this unnecessarily confusing. Instead of this underlined vs non-underlined thing, you should have different functions $U : A^{ω} \to [0, 1]$ $ and $U_{a_{s t}^{1 : n - 1}} : A \times Π \to [0, 1]$ , where the first maps action sequences to utilities, and the second maps a pair consisting of an action $x$ and a future policy $π$ to the utility of the action sequence beginning with $a_{s t}^{1 : n - 1}$ , followed by $x$ , followed by the action sequence generated by $π$ . Your first assumption would then be stated $U_{a_{s t}^{1 : n - 1}} (x, π) = U (a_{s t}^{1 : n - 1}, x, a_{π}^{1 : \infty})$ . Your second assumption (fairness of the environment) is implicit in the type signature of the utility function $U : A^{ω} \to [0, 1]$ . If your utility depends on something other than the action sequence, then it doesn't make sense to write it as a function of the action sequence. It's good to point out assumptions that are implicit in the formalism you're using, but by the time you identify utility as a function of action sequences, you don't need to assume fairness of the environment as an additional axiom. I do not understand what your third assumption is.

This is emphatically false in general, but there's a special condition that makes it viable, namely that the distribution at time n is guaranteed to assign probability 1 to $ϕ$ iff $ϕ$ . My epistemic state about this is "this seems extremely plausible, but I don't know for sure if logical inductors attain this property in the limit"

They don't. For instance, let $ϕ$ be any true undecidable sentence. The logical inductor does not assign probability 1 to $ϕ$ even in the limit. Your fourth assumption does not seem reasonable. Does $E_{n - 1} (E_{n} (U | ϕ)) = E_{n - 1} (U | ϕ)$ not give you what you want?

Note that this only explicitly writes the starting code, and the code that might be modified to, not the past or future action sequence! This is important for the agent to be able to reason about this computation, despite it taking an infinite input.

I think this is exactly backwards. The property that makes spaces easy to search through and reason about is compactness, not finiteness. If $A$ is finite, then $A^{ω}$ is compact, and thus easy to search through and reason about, provided the relevant functions on it are continuous. But the space of computer programs is an infinite discrete space, hence non-compact, and hard to search through and reason about, except by remembering that the purpose of selecting a program is so that it will generate an element of the nice, easily-searchable compact space of action sequences.

[-]Diffractor7y10

First: That notation seems helpful. Fairness of the environment isn't present by default, it still needs to be assumed even if the environment is purely action-determined, as you can consider an agent in the environment that is using a hardwired predictor of what the argmax agent would do. It is just a piece of the environment, and feeding a different sequence of actions into the environment as input gets a different score, so the environment is purely action-determined, but it's still unfair in the sense that the expected utility of feeding action $x$ into the function drops sharply if you condition on the argmax agent selecting action $x$ . The third condition was necessary to carry out this step. $E_{n} (U (a_{* 1 : n - 1}, a_{?} - - -, a_{?}^{2 : \infty})) = E_{n} (U (a_{* 1 : n - 1}, a_{?}^{1 : \infty}))$ . The intuitive interpretation of the third condition is that, if you know that policy B selects action 4, then you can step from "action 4 is taken" to "policy B takes the actions it takes", and if you have a policy where you don't know what action it takes (third condition is violated), then "policy B does its thing" may have a higher expected utility than any particular action being taken, even in a fair environment that only cares about action sequences, as the hamster dance example shows.

Second: I think you misunderstood what I was claiming. I wasn't claiming that logical inductors attain the conditional future-trust property, even in the limit, for all sentences or all true sentences. What I was claiming was: The fact that $ϕ$ is provable or disprovable in the future (in this case, $ϕ$ is $‘ ‘ a_{* n} = x "$ ), makes the conditional future-trust property hold (I'm fairly sure), and for statements where there isn't guaranteed feedback, the conditional future-trust property may fail. The double-expectation property that you state does not work to carry the proof through, because the proof (from the perspective of the first agent), takes $ϕ$ as an assumption, so the "conditional on $ϕ$ " part has to be outside of the future expectation, when you go back to what the first agent believes.

Third: the sense I meant for "agent is able to reason about this computation" is that the computation is not extremely long, so logical inductor traders can bet on it.

[-]AlexMennen7y10

Ok, understood on the second assumption. $U$ is not a function to $[0, 1]$ , but a function to the set of $[0, 1]$ -valued random variables, and your assumption is that this random variable is uncorrelated with certain claims about the outputs of certain policies. The intuitive explanation of the third condition made sense; my complaint was that even with the intended interpretation at hand, the formal statement made no sense to me.

I'm pretty sure you're assuming that $ϕ$ is resolved on day $n$ , not that it is resolved eventually.

Searching over the set of all Turing machines won't halt in a reasonably short amount of time, and in fact won't halt ever, since the set of all Turing machines is non-compact. So I don't see what you mean when you say that the computation is not extremely long.

[-]Diffractor7y10

Ah, the formal statement was something like "if the policy A isn't the argmax policy, the successor policy B must be in the policy space of the future argmax, and the action selected by policy A is computed so the relevant equality holds"

Yeah, I am assuming fast feedback that it is resolved on day $n$ .

What I meant was that the computation isn't extremely long in the sense of description length, not in the sense of computation time. Also, we aren't doing policy search over the set of all turing machines, we're doing policy search over some smaller set of policies that can be guaranteed to halt in a reasonable time (and more can be added as time goes on)

Also I'm less confident in conditional future-trust for all conditionals than I used to be, I'll try to crystallize where I think it goes wrong.

[-]AlexMennen7y10

What I meant was that the computation isn't extremely long in the sense of description length, not in the sense of computation time. Also, we aren't doing policy search over the set of all turing machines, we're doing policy search over some smaller set of policies that can be guaranteed to halt in a reasonable time (and more can be added as time goes on)

Wouldn't the set of all action sequences have lower description length than some large finite set of policies? There's also the potential problem that all of the policies in the large finite set you're searching over could be quite far from optimal.

[-]Raemon7y10

Note: looks like you were trying to use markdown. To use markdown in our editor you need to press cmd-4. (Originally the "$" notation worked, but people who weren't familiar with LaTeX were consistently confused about to actually type a dollar sign)

[-]AlexMennen7y10

Thanks, fixed.

[-]Vanessa Kosoy7y10

I second the suggestion to state what is being proved before proving it.

One important note is that CDT spectacularly fails this property. Namely, consider a game of matching pennies against a powerful predictor. Since the environment takes actions as input, it's possible to recompute what would have happened if a different action is plugged in. The CDT agent that keeps losing is going to learn to randomize between actions since it keeps seeing that the action it didn't take would have done better. So it eventually gets to a state where it predicts the reward from "pick heads" and "pick tails" is 0.5 (because there's a 50% chance it doesn't pick heads/tails), but it predicts the reward from "I take an action" is 0, violating this assumption.

Note, however, that ordinary Bayes-optimal RL works perfectly (assuming there are no traps in the prior or paranoia is otherwise avoided), since it would believe that taking a certain action causes the predictor to make the optimal response. This is similar to RL one-boxing in the repeated Newcomb's problem.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

8

Probabilistic Tiling (Preliminary Attempt)

8

DEFINITIONS:

PROOF:

ANALYSIS: