Solomonoff Induction

Alex_Altair
Multicore (+76/-134)
plex added links to Arbital
[anonymous] (+13) /* Blog posts */ fixed authorship (lukeprog wrote the first half)
[anonymous] (+73) added link to the "intuitive explanation" for solomonoff induction
Tyrrell_McAllister (+54/-69) Minor rewriting
Tyrrell_McAllister (+207/-175) some minor rewording
Tyrrell_McAllister (+2019) Added derivation of Solomonoff prior
Vladimir Nesov (+145/-141)
Zack M. Davis (+500/-22) writing, revision, destubiffy

Solomonoff induction is an inference system defined by Ray Solomonoff defined an inference system that will learn to correctly predict any computable sequence with only the absolute minimum amount of data. This system, in a certain sense, is the perfect universal prediction algorithm. 

More precisely, suppose that a particular infinite input string x0 is about to be fed into U. However, you know nothing about x0 other than that each term of the string is either 0 or 1.1. As far as your state of knowledge is concerned, the ith digit of x0 is as likely to be 0 as it is to be 1,1, for all i = 1, 2, .…. You want to find the a priori probability m(y0) of the following proposition:

Unfortunately, computing the exact value of m(y0) would require solving the halting problem, which is undecidable. Nonetheless, it is easy to derive an expression for m(y0). If U halts on an infinite input string x, then U must read only a finite initial segment of x, after which U immediately halts. We call a finite string p a self-delimiting program if and only if there exists an infinite input string x beginning with p such that U halts on x immediately after reading to the end of p. The set 𝒫𝒫 of self-delimiting programs is the prefix code for U. It is the determination of the elements of 𝒫𝒫 that requires a solution to the halting problem.

Given p ∈ 𝒫, we write "prog"prog (x0) = p" to express the proposition that x0 begins with p, and we write "U(p) = y0" to express the proposition that U produces output y0, and then halts, when fed any input beginning with p. Proposition (*) is then equivalent to the exclusive disjunction


p ∈ 𝒫: U(p) = y0(prog (x0) = p).
Since x0 was chosen at random from {0,{0, 1}ω, we take the probability of prog (x0) = p to be 2 − ℓ(p), where ℓ(ℓ(p) is the length of p as a bit string. Hence, the probability of (*) is


m(y0) := ∑p ∈ 𝒫: U(p) = y02 − ℓ(p).
 

Blog posts

More precisely, we suppose that a particular infinite input string x0 ∈ {0, 1}ω is about to be fed into U. However, you know nothing about x0 other than that iteach term of the string is an infinite string ofeither 0s and or 1s. That is, as. As far as your state of knowledge is concerned, the ith digit of x0 is as likely to be 0 as it is to be 1, for all i = 1, 2, …. You want to find the a priori probability m(y0) of the following proposition:

Unfortunately, computing the exact value of m(y0) would require solving the halting problem, which is undecidable. Nonetheless, it is easy to derive an expression for m(y0). If U halts on an infinite input string x, then U must read only a finite initial segment of x, after which U immediately halts. We call a finite string p a self-delimiting program if and only if there isexists an infinite input string x ∈ {0, 1}ω beginning with p such that U halts on x immediately after reading to the end of p. The set 𝒫 of self-delimiting programs is the prefix code for U. It is the determination of the elements of 𝒫 that requires a solution to the halting problem.

Given p ∈ 𝒫, we write "prog (x0) = p" to express the proposition that x0 begins with p, and we write "U(p) = y0" to express the proposition that U produces output y0, and then halts, when fed anany input beginning with p. Proposition (*) is then equivalent to the exclusive disjunction

(*) If U takes in x0 as input, then U produceswill produce output y0 and then immediately halts.halt.

ItUnfortunately, computing the exact value of m(y0) would require solving the halting problem, which is undecidable. Nonetheless, it is easy to writederive an expression for m(y0). Unfortunately, evaluating the expression would require solving the halting problem, which is undecidable. Nonetheless, ifIf U halts on an infinite input string x, then U must read only a finite initial segment of x, after which U immediately halts. We call a finite string p a self-delimiting program if and only if there is an input string x ∈ {0, 1}ω beginning with p such that U halts on x immediately after reading to the end of p. The set 𝒫 of self-delimiting programs is the prefix code for U. It is the determination of the elements of 𝒫 that requires a solution to the halting problem.

Given p ∈ 𝒫, we write "prog (x0) = p" to express the proposition that x0 begins with p, and we write "U(p) = y0" to express the proposition that U produces output y0, and then halts, when fed an input beginning with p. Proposition (*) is then equivalent to the exclusive disjunction


p ∈ 𝒫: U(p) = y0(prog (x0) = p).
Since x0 was chosen at random from {0, 1}ω, we have that


take the probability of pr (progprog (x0) = p) =  to be 2 − ℓ(p),
where ℓ(p) is the length of p as a bit string. Hence, the probability of (*) is

Solomonoff induction gets off the ground with a solution to the "problem of the priors". Suppose that you stand before a universal prefix Turing machineU. You are interested in a certain finite output string y0. In particular, you want to know the probability that U will produce the output y0 given a random input tape. This probability is the Solomonoff a priori probability of y0.

More precisely, we suppose that a particular infinite input string x0 ∈ {0, 1}ω is about to be fed into U. However, you know nothing about x0 other than that it is an infinite string of 0s and 1s. That is, as far as your state of knowledge is concerned, the ith digit of x0 is as likely to be 0 as it is to be 1, for all i = 1, 2, …. You want to find the a priori probability m(y0) of the following proposition:

(*) If U takes in x0 as input, then U produces output y0 and then immediately halts.

It is easy to write an expression for m(y0). Unfortunately, evaluating the expression would require solving the halting problem, which is undecidable. Nonetheless, if U halts on an infinite input string x, then U must read only a finite initial segment of x, after which U immediately halts. We call a finite string p a self-delimiting program if and only if there is an input string x ∈ {0, 1}ω beginning with p such that U halts on x immediately after reading to the end of p. The set 𝒫 of self-delimiting programs is the prefix code for U. It is the determination of the elements of 𝒫 that requires a solution to the halting problem.

Given p ∈ 𝒫, we write "prog (x0) = p" to express the proposition that x0 begins with p, and we write "U(p) = y0" to express the proposition that U produces output y0 when fed an input beginning with p. Proposition (*) is then equivalent to the exclusive disjunction


p ∈ 𝒫: U(p) = y0(prog (x0) = p).
Since
x0 was chosen at random from {0, 1}ω, we have that


pr (prog (x0) = p) = 2 − ℓ(p),
where
ℓ(p) is the length of p as a bit string. Hence, the probability of (*) is


m(y0) := ∑p ∈ 𝒫: U(p) = y02 − ℓ(p).

Ray Solomonoff defined an inference system that will learn to correctly predict any computable sequence with only the absolute minimum amount of data. It would thus,This system, in somea certain sense, beis the perfect universal prediction algorithm, if only it were computable.algorithm. To summarize it very informally, Solomonoff induction works by startingby:

  • Starting with all possible hypotheses (sequences) as represented by computer programs on some universal Turing machine,(that generate those sequences), weighted by their simplicity (2^(2-n, where n is the program length), then discarding;
  • Discarding those hypotheses that are inconsistent with the data. By weighting

Weighting hypotheses by simplicity, the system automatically incorporates a form of Occam's razor,razor, which is why it has been playfully referred to as Solomonoff's lightsaber.

Solomonoff's inductiveRay Solomonoff defined an inference system will learn to correctly predict any computable sequence with only the absolute minimum amount of data. It would thus, in some sense, be the perfect universal prediction algorithm, if only it were computable. To summarize it very informally, Solomonoff induction works by starting with all possible hypotheses as represented by computer programs on some universal Turing machine, weighted by their simplicity (2^-n, where n is the program length), then discarding those hypotheses that are inconsistent with the data. By weighting by simplicity, the system automatically incorporates a form of Occam's razor, which is why it has been playfully referred to as Solomonoff's lightsaber.

Load More (10/12)