Two Questions about Solomonoff Induction

[-]SamEisenstat8y20

Universal Prediction of Selected Bits solves the related question of what happens if the odd bits are adversarial but the even bits copy the preceding odd bits. Basically, the universal semimeasure learns to do the right thing, but the exact sense in which the result is positive is subtle and has to do with the difference between measures and semimeasures. The methods may also be relevant to the questions here, though I don't see a proof for either question yet.

[-]jessicata10y10

My intuition is that, for the iid bits with uncomputable probability r, there will exist an optimal (up to a constant) solution of the form:

compute some real number $^r$ by running the shortest program computing this
generate bits iid from $B e r (^r)$

If this is the case, then the optimal algorithm will choose $^r$ to minimize $K (^r) + n K L (B e r (r) | | B e r (^r))$ if $n$ is the number of bits observed. As $n$ goes to infinity, minimizing this objective causes $^r$ to converge to $r$ .

Now I'm not sure how to prove that an optimal (up to a constant) hypothesis of the form above exists. Basically this is saying that $K (the n bits) \geq c + {min}_{^r} (K (^r) + n K L (B e r (r) | | B e r (^r)))$ for some constant $c$ with high probability. Possibly the argument goes something like: for any non-conditionally-iid way of assigning probabilities to the bits, there is a better iid way (by averaging the probabilities). And if you are flipping the bits iid, then at some point you compute the probability of each flip, so you might as well just optimally compress some $^r$ . But I couldn't quite formalize this argument well, because of the fact that hypotheses can encode correct correlations through overfitting.

[-]Vanessa Kosoy9y00

Can you provide links to the solutions, I am curious? (especially regarding the second question)

[-]cousin_it10y00

Well, Laplace's rule of succession is a computable prior that will almost certainly converge to your uncomputable probability value, and I think the difference in log scores from the "true" prior will be finite too. Since Laplace's rule is included in the Solomonoff mixture, I suspect that things should work out nicely. I don't have a proof, though.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

3

Two Questions about Solomonoff Induction

3