Multicore | v0.1.0Sep 25th 2020 | (+76/-134) | ||

ete | v0.0.9Oct 13th 2016 | added links to Arbital | ||

v0.0.8May 18th 2013 | (+13) /* Blog posts */ fixed authorship (lukeprog wrote the first half) | |||

v0.0.7May 18th 2013 | (+73) added link to the "intuitive explanation" for solomonoff induction | |||

Tyrrell_McAllister | v0.0.6Feb 13th 2012 | (+54/-69) Minor rewriting | ||

Tyrrell_McAllister | v0.0.5May 1st 2011 | (+207/-175) some minor rewording | ||

Tyrrell_McAllister | v0.0.4Apr 14th 2011 | (+2019) Added derivation of Solomonoff prior | ||

Vladimir Nesov | v0.0.3Dec 2nd 2009 | (+145/-141) | ||

Zack M. Davis | v0.0.2Nov 18th 2009 | (+500/-22) writing, revision, destubiffy | ||

PeerInfinity | v0.0.1Sep 28th 2009 |

**Solomonoff induction **is an inference system defined by Ray Solomonoff~~ defined an inference system~~ that will learn to correctly predict any computable sequence with only the absolute minimum amount of data. This system, in a certain sense, is the perfect universal prediction algorithm.

More precisely, suppose that a particular infinite input string *x*_{0} is about to be fed into * U*. However, you know nothing about

Unfortunately, computing the exact value of * m*(

Given * p* ∈ 𝒫, we write

⋁_{p}_{ ∈ 𝒫: }_{U}_{(}_{p}_{) = }_{y}_{0}(prog (*x*_{0}) = * p*).

Since

* m*(

~~An Intuitive Explanation of Solomonoff Induction~~~~by lukeprog and Alex_Altair~~

- An Intuitive Explanation of Solomonoff Induction by lukeprog and Alex_Altair

- An Intuitive Explanation of Solomonoff Induction by Alex_Altair

More precisely, ~~we ~~suppose that a particular infinite input string *x*_{0}~~ ∈ {0, 1}~~^{ω} is about to be fed into *U*. However, you know nothing about *x*_{0} other than that ~~it~~each term of the string is ~~an infinite string of~~either 0~~s and~~ or 1~~s. That is, as~~. As far as your state of knowledge is concerned, the *i*th digit of *x*_{0} is as likely to be 0 as it is to be 1, for all *i* = 1, 2, …. You want to find the *a priori* probability *m*(*y*_{0}) of the following proposition:

Unfortunately, computing the exact value of *m*(*y*_{0}) would require solving the halting problem, which is undecidable. Nonetheless, it is easy to derive an expression for *m*(*y*_{0}). If *U* halts on an infinite input string *x*, then *U* must read only a finite initial segment of *x*, after which *U* immediately halts. We call a finite string *p* a *self-delimiting program* if and only if there ~~is~~exists an infinite input string *x*~~ ∈ {0, 1}~~^{ω} beginning with *p* such that *U* halts on *x* immediately after reading to the end of *p*. The set 𝒫 of self-delimiting programs is the *prefix code* for *U*. It is the determination of the elements of 𝒫 that requires a solution to the halting problem.

Given *p* ∈ 𝒫, we write "prog (*x*_{0}) = *p*" to express the proposition that *x*_{0} begins with *p*, and we write "*U*(*p*) = *y*_{0}" to express the proposition that *U* produces output *y*_{0}, and then halts, when fed ~~an~~any input beginning with *p*. Proposition (*) is then equivalent to the exclusive disjunction

(*) If *U* takes in *x*_{0} as input, then *U* ~~produces~~will produce output *y*_{0} and then ~~immediately halts.~~halt.

~~It~~Unfortunately, computing the exact value of *m*(*y*_{0}) would require solving the halting problem, which is undecidable. Nonetheless, it is easy to ~~write~~derive an expression for *m*(*y*_{0}). ~~Unfortunately, evaluating the expression would require solving the halting problem, which is undecidable. Nonetheless, if~~If *U* halts on an infinite input string *x*, then *U* must read only a finite initial segment of *x*, after which *U* immediately halts. We call a finite string *p* a *self-delimiting program* if and only if there is an input string *x* ∈ {0, 1}^{ω} beginning with *p* such that *U* halts on *x* immediately after reading to the end of *p*. The set 𝒫 of self-delimiting programs is the *prefix code* for *U*. It is the determination of the elements of 𝒫 that requires a solution to the halting problem.

Given *p* ∈ 𝒫, we write "prog (*x*_{0}) = *p*" to express the proposition that *x*_{0} begins with *p*, and we write "*U*(*p*) = *y*_{0}" to express the proposition that *U* produces output *y*_{0}, and then halts, when fed an input beginning with *p*. Proposition (*) is then equivalent to the exclusive disjunction

⋁_{p ∈ 𝒫: U(p) = y0}(prog (*x*_{0}) = *p*).

Since *x*_{0} was chosen at random from {0, 1}^{ω}, we ~~have that~~

take the probability of ~~pr (prog~~prog (*x*_{0}) = *p*~~) = ~~ to be 2^{ − ℓ(p)}, where ℓ(*p*) is the length of *p* as a bit string. Hence, the probability of (*) is

Solomonoff induction gets off the ground with a solution to the "problem of the priors". Suppose that you stand before a universal prefix Turing machine*U*. You are interested in a certain finite output string *y*_{0}. In particular, you want to know the probability that *U* will produce the output *y*_{0} given a random input tape. This probability is the **Solomonoff a priori probability** of

More precisely, we suppose that a particular infinite input string *x*_{0} ∈ {0, 1}^{ω} is about to be fed into *U*. However, you know nothing about *x*_{0} other than that it is an infinite string of 0s and 1s. That is, as far as your state of knowledge is concerned, the *i*th digit of *x*_{0} is as likely to be 0 as it is to be 1, for all *i* = 1, 2, …. You want to find the *a priori* probability *m*(*y*_{0}) of the following proposition:

(*) If *U* takes in *x*_{0} as input, then *U* produces output *y*_{0} and then immediately halts.

It is easy to write an expression for *m*(*y*_{0}). Unfortunately, evaluating the expression would require solving the halting problem, which is undecidable. Nonetheless, if *U* halts on an infinite input string *x*, then *U* must read only a finite initial segment of *x*, after which *U* immediately halts. We call a finite string *p* a *self-delimiting program* if and only if there is an input string *x* ∈ {0, 1}^{ω} beginning with *p* such that *U* halts on *x* immediately after reading to the end of *p*. The set 𝒫 of self-delimiting programs is the *prefix code* for *U*. It is the determination of the elements of 𝒫 that requires a solution to the halting problem.

Given *p* ∈ 𝒫, we write "prog (*x*_{0}) = *p*" to express the proposition that *x*_{0} begins with *p*, and we write "*U*(*p*) = *y*_{0}" to express the proposition that *U* produces output *y*_{0} when fed an input beginning with *p*. Proposition (*) is then equivalent to the exclusive disjunction

⋁_{p ∈ 𝒫: U(p) = y0}(prog (*x*_{0}) = *p*).

Since *x*_{0} was chosen at random from {0, 1}^{ω}, we have that

pr (prog (*x*_{0}) = *p*) = 2^{ − ℓ(p)},

where ℓ(*p*) is the length of *p* as a bit string. Hence, the probability of (*) is

*m*(*y*_{0}) := ∑_{p ∈ 𝒫: U(p) = y0}2^{ − ℓ(p)}.

Ray Solomonoff defined an inference system that will learn to correctly predict any computable sequence with only the absolute minimum amount of data. ~~It would thus,~~This system, in ~~some~~a certain sense, ~~be~~is the perfect universal prediction ~~algorithm, if only it were computable.~~algorithm. To summarize it very informally, **Solomonoff induction** works ~~by starting~~by:

- Starting with all possible hypotheses (sequences) as represented by computer programs
~~on some universal Turing machine,~~(that generate those sequences), weighted by their simplicity~~(2^~~(2^{-n}, whereis the program length)**n**~~, then discarding~~; - Discarding those hypotheses that are inconsistent with the data.
~~By weighting~~

Weighting hypotheses by simplicity, the system automatically incorporates a form of Occam's ~~razor,~~razor, which is why it has been playfully referred to as *Solomonoff's lightsaber*.

~~Solomonoff's inductive~~Ray Solomonoff defined an inference system will learn to correctly predict any computable sequence with only the absolute minimum amount of data. It would thus, in some sense, be the perfect universal prediction algorithm, if only it were computable. To summarize it very informally, **Solomonoff induction** works by starting with all possible hypotheses as represented by computer programs on some universal Turing machine, weighted by their simplicity (2^-*n*, where *n* is the program length), then discarding those hypotheses that are inconsistent with the data. By weighting by simplicity, the system automatically incorporates a form of Occam's razor, which is why it has been playfully referred to as *Solomonoff's lightsaber*.