(I'm not sure about any of this)
(Yup! Vanessa Kowalski pointed out a crucial error, see her comment below. The idea about the universal prior picking up all computable regularities is unproved. I'm leaving the post as it was, but please read it as a historical document only.)
Abram Demski had an old paper describing a naive way to use the universal prior for logical uncertainty. He pointed out some problems with it, but I'm not sure the problems are real. I think the approach will work surprisingly well.
The idea is that, since the universal prior is just a probability distribution over bit sequences, we don't have to use it to predict bits one by one. We can also update it on facts like "the seventh bit is 1" without mentioning the previous six bits. To choose which bits to update on, we can mechanically search through all possible proofs in Peano arithmetic (PA) by increasing length. Whenever we prove a sentence with Gödel number N, we tell the prior that the Nth bit is 1; whenever we disprove one, we tell the prior that the bit is 0. After each update, the prior's opinions about all other bits (and thus all other sentences in PA) will shift, hopefully in a meaningful way.
It's easy to see that the probability of any provable sentence will converge to 1, because at some point it becomes 1 by fiat, and disprovable sentences will likewise go to 0. How fast is the convergence? I think it will be extremely fast, far outpacing our slow proof search and subsuming any possible human or computer reasoning about PA - even though the prior knows nothing about PA and only receives opaque bits.
To see why, recall that the universal prior can pick up any computable regularities. For example, if you feed it a sequence where all even-numbered bits are 1 and odd-numbered bits are hard to predict, it will quickly become very certain that all future even-numbered bits will be 1. Similarly, if there's some decidable set of Gödel numbers that for some reason correspond to provable sentences (no matter how long their proofs are), the universal prior will become very certain of that after receiving enough data. The same goes for regularities in relationships between bits, like "for all A and B, if A is provable and A→B is provable, then B is provable as well". All such shortcuts will be learned in full generality as quickly as possible.
This approach to logical uncertainty can work well for questions that can be settled by PA, like digits of π. What about sentences that aren't provable or disprovable in PA, like Con(PA)? It's tempting to try and make the process converge for all of those as well, but I don't want to do that, because many such sentences are basically hallucinations. Who cares what their probabilities are.
Why does this approach matter? Isn't the universal prior uncomputable? But it's approximable, so any approximation scheme could be used in the same way as logical induction (another computable process with uncomputably slow convergence). Also the universal prior is well-studied and has many nice properties.
I don't want to diminish the amazing work that the logical induction folks have done. In some ways (e.g. reflection) it works better than the universal prior. But it's interesting to me that the idea of probabilistic reasoning about future theorems based on a small amount of known theorems can be captured so well by a standard approach, similar to how UDT can be almost completely covered by a one-liner combination of standard game theory tools.
Well, you're right. I'll put a retraction in the post. Thank you!
By asking for citations you're giving me too much credit - I'm playing the game a few levels below you. I was going off the "obvious" idea that if our prior gives nonzero probability to a hypothesis, and we feed the prior a sequence of bits that matches the hypothesis, then the posterior probability of the hypothesis should rise to 1. Took me some time to realize that's not true at all!
Here's a simple counterexample. Say our hypothesis is "all even bits are 0", and our prior is an equal mixture of "all even bits are 0 and all odd bits are random" and "all odd bits are 0 and all even bits are random". Note that our hypothesis starts out with probability 1/2 according to the prior. But if we feed the prior a sequence of all 0's, which matches the hypothesis, the posterior probability of the hypothesis won't go to 1. It will keep oscillating forever. The prior just happens to be too good at guessing this particular sequence without help from our hypothesis.
I'd be surprised if something like that happened with computable hypotheses and the universal prior, but I don't have a proof and couldn't find one in a few hours. So thanks again.