In this post, I ask two questions about Solomonoff Induction. I am not sure if these questions are open or not. If you know the answer to either of them, please let me know. I think that the answers may be very relevant to stuff I am currently working on in Asymptotic Logical Uncertainty.

Consider a universal Turing machine which takes in an infinite stream of random bits, and outputs a finite or infinite string of bits. For any bit string , let denote the probability that is a prefix of the output of .

Let be an probabilistic environment which outputs an infinite bit string, and set be the first bits output by .

If assigns outputs according to any computable probability distribution, then with probability 1, quickly converges to the actual probability that outputs a 1 at location given that its first bits were .

In particular, if outputs an infinite stream of iid bits which are each 1 with some rational probability , then will converge to with probability 1.

However I am curious what happens if is not computable. In the simplest case outputs an infinite stream of iid bits which are each 1 with some uncomputable real number probability . In this case, will converge to with probability 1?

To make things slightly more complicated, assume that the odd index bits of come from a fair coin, but the even bits come from an adversary who chooses the bits in an uncomputable way. (This adversary can see all the bits output by so far, but not future bits.) In this case, will converge to 1/2?

Universal Prediction of Selected Bits solves the related question of what happens if the odd bits are adversarial but the even bits copy the preceding odd bits. Basically, the universal semimeasure learns to do the right thing, but the exact sense in which the result is positive is subtle and has to do with the difference between measures and semimeasures. The methods may also be relevant to the questions here, though I don't see a proof for either question yet.

My intuition is that, for the iid bits with uncomputable probability r, there will exist an optimal (up to a constant) solution of the form:

If this is the case, then the optimal algorithm will choose ^r to minimize K(^r)+nKL(Ber(r)||Ber(^r)) if n is the number of bits observed. As n goes to infinity, minimizing this objective causes ^r to converge to r.

Now I'm not sure how to prove that an optimal (up to a constant) hypothesis of the form above exists. Basically this is saying that K(the n bits)≥c+min^r(K(^r)+nKL(Ber(r)||Ber(^r))) for some constant c with high probability. Possibly the argument goes something like: for any non-conditionally-iid way of assigning probabilities to the bits, there is a better iid way (by averaging the probabilities). And if you are flipping the bits iid, then at some point you compute the probability of each flip, so you might as well just optimally compress some ^r. But I couldn't quite formalize this argument well, because of the fact that hypotheses can encode correct correlations through overfitting.

Can you provide links to the solutions, I am curious? (especially regarding the second question)

Well, Laplace's rule of succession is a computable prior that will almost certainly converge to your uncomputable probability value, and I think the difference in log scores from the "true" prior will be finite too. Since Laplace's rule is included in the Solomonoff mixture, I suspect that things should work out nicely. I don't have a proof, though.