It is typical for a Bayesian analysis to resolve the problem by pretending that all hypotheses are around "from the very beginning" so that all hypotheses are judged on all evidence. The ~~periherion~~perihelion precession of ~~mercury~~Mercury is very difficult to explain from Newton's theory of gravitation, and therefore quite improbable; but it fits quite well with Einstein's theory of gravitation. Therefore, Newton gets "ruled out" by the evidence, and Einstein wins.

Suppose a new scientific hypothesis, such as general relativity, explains a well-know observation such as the perihelion precession of mercury better than any existing theory. Intuitively, this is a point in favor of the new theory. However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on long ago? This is known as **the problem of old evidence**, and is usually ~~levelled~~leveled as a charge against Bayesian epistemology.

~~[Needs~~It is typical for a Bayesian analysis to resolve the problem by pretending that all hypotheses are around "from the very beginning" so that all hypotheses are judged on all evidence. The periherion precession of mercury is very difficult to explain from Newton's theory of gravitation, and therefore quite improbable; but it fits quite well with Einstein's theory of gravitation. Therefore, Newton gets "ruled out" by the evidence, and Einstein wins.

A drawback of this approach is that it allows scientists to formulate a hypothesis in light of the evidence, and then use that very same evidence in their favor. Imagine a physicist competing with Einstein, Dr. Bad, who publishes a "theory of gravity" which is just a list of all the observations we have made about the orbits of celestial bodies. Dr. Bad has "cheated" by providing the correct answers without any deep explanations; but "deep explanation" is not an objectively verifiable quality of a hypothesis, so it should not factor into the calculation of scientific merit, if we are to use simple update rules like Bayes' Law. Dr. Bad's theory will predict the evidence as well or better than Einstein's. So the new picture is that Newton's theory gets eliminated by the evidence, but Einstein's and Dr. Bad's theories remain as contenders.

The scientific method emphasizes *predictions made in advance* to avoid this type of cheating. To test Einstein's hypothesis, Sir Arthur Eddington measured Mercury's orbit in *more* accuracy than had been done before. This test would have ruled out Dr. Bad's theory of gravity, since (unless Dr. Bad possessed a time machine) there would be no way for Dr. Bad to know what to predict.

Proponents of *simplicity priors* will instead say that the problem with Dr. Bad's theory can be identified by looking at its description length in contrast to Einstein's. We can tell that Einstein didn't cheat by gerrymandering his theory to specially predict Mercury's orbit correctly, because the theory is mathematically succinct! There is no *room* to cheat; no empty closet in which to smuggle information about Mercury. Marcus Hutter argues for this resolution to the problem of old evidence in *On Universal Prediction and Bayesian Confirmation**.*

In contrast,...

•

Applied to Einstein's Arrogance by Abram Demski ago

Suppose a new scientific hypothesis, such as general relativity, explains a well-know observation such as the perihelion precession of mercury better than any existing theory. Intuitively, this is a point in favor of the new theory. However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on long ago? This is known as the problem of old evidence, and is usually levelled as a charge against Bayesian epistemology.

[Needs to be expanded!]

•

Applied to Applications of logical uncertainty by Abram Demski ago

•

Applied to Toward a New Technical Explanation of Technical Explanation by Abram Demski ago

•

Applied to Radical Probabilism by Abram Demski ago

•

Created by Abram Demski at

Suppose a new scientific hypothesis, such as general relativity, explains a well-know observation such as the perihelion precession of mercury better than any existing theory. Intuitively, this is a point in favor of the new theory. However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on long ago? This is known as

the problem of old evidence, and is usually leveled as a charge against Bayesian epistemology.## Bayesian Solutions vs Scientific Method

## Simplicity Priors

Proponents of

will instead say that the problem with Dr. Bad's theory can be identified by looking at its description length in contrast to Einstein's. We can tell that Einstein didn't cheat by gerrymandering his theory to specially predict Mercury's orbit correctly, because the theory is mathematically succinct! There is no~~simplicity~~simplicity-based priorsroomto cheat; no empty closet in which to smuggle information about Mercury. Marcus Hutter argues for this resolution to the problem of old evidence inOn Universal Prediction and Bayesian Confirmation.## Logical Uncertainty

Even in cases where we can measure simplicity perfectly, such as in Solomonoff Induction, is it really a perfect correction for the problem of old evidence? This becomes implausible in cases of logical uncertainty.

Simplicity priors seem like a very plausible solution to the problem of old evidence in the case of

empiricaluncertainty. If I try to "cheat" by baking in some known information into my hypothesis, without having real explanatory insight, then the description length of my hypothesis will be expanded by the number of bits I sneak in. This will penalize the prior probability byexactly the amount I stand to benefit by predicting those bits!In other words, the penalty is balanced so that it does not matter whether I try to "cheat" or not.The same argument does not hold if I am predicting mathematical facts rather than empirical facts, however. Mathematicians are often in a situation where they already know how to calculate a sequence of numbers, but they are looking for some deeper understanding, such as a closed-form expression for the sequence, or a statistical model of the sequence (EG, the prime number theorem describes the statistics of the prime numbers). It is common to compute long sequences in order to check conjectures against more of the sequence, and in doing so, treat the computed numbers as

evidencefor a conjecture.If I claimed to have some way to predict the prime numbers, but it turned out that my method actually had one of the standard ways to calculate prime numbers hidden within the source code, I would be accused of "cheating" in much the same way...