Sam Eisenstat

As you say, this isn't a proof, but it wouldn't be too surprising if this were consistent. There is some such that has a proof of length by a result of Pavel Pudlák (On the length of proofs of finitistic consistency statements in first order theories). Here I'm making the dependence on explicit, but not the dependence on . I haven't looked at it closely, but the proof strategy in Theorems 5.4 and 5.5 suggests that will not depend on , as long as we only ask for the weaker property that will only be provable in length for sentences of length at most .

I misunderstood your proposal, but you don't need to do this work to get what you want. You can just take each sentence as an axiom, but declare that this axiom takes symbols to invoke. This could be done by changing the notion of length of a proof, or by taking axioms and with very long.

I'm having trouble thinking about what it would mean for a circuit to contain daemons such that we could hope for a proof. It would be nice if we could find a simple such definition, but it seems hard to make this intuition precise.

For example, we might say that a circuit contains daemons if it displays more optimization that necessary to solve a problem. Minimal circuits could have daemons under this definition though. Suppose that some function describes the behaviour of some powerful agent, a function is like with noise added, and our problem is to predict sufficiently well the function . Then, the simplest circuit that does well won't bother to memorize a bunch of noise, so it will pursue the goals of the agent described by more efficiently than , and thus more efficiently than necessary.

Two minor comments. First, the bitstrings that you use do not all correspond to worlds, since, for example, implies , as is a subtheory of . This can be fixed by instead using a tree of sentences that all diagonalize against themselves. Tsvi and I used a construction in this spirit in A limit-computable, self-reflective distribution, for example.

Second, I believe that weakening #2 in this post also cannot be satisfied by any constant distribution. To sketch my reasoning, a trader can try to buy a sequence of sentences , spending $$2^{-n}$ on the \(n\)th sentence \(\phi_1 \wedge \dots \wedge \phi_n\). It should choose the sequence of sentences so that \(\phi_1 \wedge \dots \wedge \phi_n\) has probability at most \(2^{-n}\), and then it will make an infinite amount of money if the sentences are simultaneously true.

The way to do this is to choose each from a list of all sentences. If at any point you notice that has too high a probability, then pick a new sentence for . We can sell all the conjunctions for and get back the original amount payed by hypothesis. Then, if we can keep using sharper continuous tests of the probabilities of the sentences over time, we will settle down to a sequence with the desired property.

In order to turn this sketch into a proof, we need to be more careful about how these things are to be made continuous, but it seems routine.

I at first didn't understand your argument for claim (2), so I wrote an alternate proof that's a bit more obvious/careful. I now see why it works, but I'll give my version below for anyone interested. In any case, what you really mean is the probability of deciding a sentence outside of *by having it announced by nature*; there may be a high probability of sentences being decided indirectly via sentences in .

Instead of choosing as you describe, pick so that the probability of sampling something in is greater than . Then, the probability of sampling something in is at least . Hence, no matter what sentences have been decided already, the probability that repeatedly sampling from selects before it selects any sentence outside of is at least

as desired.

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering!

Oh, I didn't notice that aspect of things. That's pretty cool.

A few thoughts:

I agree that the LI criterion is "pointwise" in the way that you describe, but I think that this is both pretty good and as much as could actually be asked. A single efficiently computable trader can do a lot. It can enforce coherence on a polynomially growing set of sentences, search for proofs using many different proof strategies, enforce a polynomially growing set of statistical patterns, enforce reflection properties on a polynomially large set of sentences, etc. So, eventually the market will not be exploitable on all these things simultaneously, which seems like a pretty good level of accurate beliefs to have.

On the other side of things, it would be far to strong to ask for a uniform bound of the form "for every , there is some day such that after step , no trader can multiply its wealth by a factor more than ". This is because a trader can be hardcoded with arbitrarily many hard-to-compute facts. For every , there must eventually be a day on which the belief of your logical inductor assign probability less than to some true statement, at which point a trader who has that statement hardcoded can multiply its wealth by . (I can give a construction of such a sentence using self-reference if you want, but it's also intuitively natural - just pick many mutually exclusive statements with nothing to break the symmetry.)

Thus, I wouldn't think of traders as "mistakes", as you do in the post. A trader can gain money on the market if the market doesn't already know all facts that will be listed by the deductive process, but that is a very high bar. Doing well against finitely many traders is already "pretty good".

What you can ask for regarding uniformity is for some simple function such that any trader can multiply its wealth by at most a factor . This is basically the idea of the mistake bound model in learning theory; you bound how many mistakes happen rather than when they happen. This would let you say a more than the one-trader properties I mentioned in my first paragraph. In fact, has this propery; is just the initial wealth of the trader. You may therefore want to do something like setting traders' initial wealths according to some measure of complexity. Admittedly this isn't made explicit in the paper, but there's not much additional that needs to be done to think in this way; it's just the combination of the individual proofs in the paper with the explicit bounds you get from the initial wealths of the traders involved.

I basically agree completely on your last few points. The traders are a model class, not an ensemble method in any substantive way, and it is just confusing to connect them to the papers on ensemble methods that the LI paper references. Also, while I use the idea of logical induction to do research that I hope will be relevant to practical algorithms, it seems unlikely than any practical algorithm will look much like a LI. For one thing, finding fixed points is really hard without some property stronger than continuity, and you'd need a pretty good reason to put it in the inner loop of anything.

Universal Prediction of Selected Bits solves the related question of what happens if the odd bits are adversarial but the even bits copy the preceding odd bits. Basically, the universal semimeasure learns to do the right thing, but the exact sense in which the result is positive is subtle and has to do with the difference between measures and semimeasures. The methods may also be relevant to the questions here, though I don't see a proof for either question yet.

Yeah, I like tail dependence.

There's this question of whether for logical uncertainty we should think of it more as trying to "un-update" from a more logically informed perspective rather than trying to use some logical prior that exists at the beginning of time. Maybe you've heard such ideas from Scott? I'm not sure if that's the right perspective, but it's what I'm alluding to when I say you're introducing artificial logical uncertainty.

Yeah, the 5 and 10 problem in the post actually can be addressed using provability ideas, in a way that fits in pretty natually with logical induction. The motivation here is to work with decision problems where you can't prove statements for agent , utility function , action , and utility value , at least not with the amount of computing power provided, but you want to use inductive generalizations instead. That isn't necessary in this example, so it's more of an illustration.

To say a bit more, if you make logical inductors propositionally consistent, similarly to what is done in this post, and make them assign things that have been proven already probability 1, then they will work on the 5 and 10 problem in the post.

It would be interesting if there was more of an analogy to explore between the provability oracle setting and the inductive setting, and more ideas could be carried over from modal UDT, but it seems to me that this is a different kind of problem that will require new ideas.

Theorem. Weak HCH (and similar proposals) contain EXP.

Proof sketch: I give a strategy that H can follow to determine whether some machine that runs in O(2cnk) time accepts. Basically, we need to answer questions of the form "Does cell x have value y at time t." and "Was the head in position x at time t?", where x and t are bounded by some function in O(2cnk). Using place-system representations of x and t, these questions have length in O(nk), so they can be asked. Further, each question is a simple function of a constant number of other such questions about earlier times as long as t>0, and can be answered directly in the base case t=0.