Asymptotic Decision Theory (Improved Writeup)

byDiffractor5mo27th Sep 201813 comments

7


(asymptotic decision theory, initially detailed in this paper) was a proposed decision theory with logical inductors, developed after EDT with logical inductors and exploration steps. It is a possible candidate for conceptual progress in decision theory, but some basic questions about its performance are still unsettled, and there are several issues with the current implementation of it.

Definitions:

A market is a computable function with type (where is the set of sentences of math) that will be denoted as . We will be considering logical inductors in this setting, and in particular, each finite stage of a logical inductor is a market.

Let be a finite set of actions. An agent is a computable sequence of computable functions of type . It takes a timestep and a probability distribution, and selects a distribution over actions. Because logical inductors associate each timestep with an action, specifying the logical inductor and the agent specifies a sequence of actions. Because we will be assuming some fixed logical inductor in the background, we will suppress it in the notation and refer to the action produced by an agent at time t as or . Randomization can be handled by using the diagonalization sentences used to define exploration (this acts in such a way that no trader in the inductor is able to reliably predict what the agent does), or it can be handled by having one of the actions be to call a random number generator in the environment.

An embedder is a sequence of randomized functions of type (denoted by or , with the function at time t being denoted by or ). Let denote the random variable that corresponds to the environment using a uniform random distribution of bitstrings for its randomness tape. An embedder must have the probability distribution of being computably approximable on computable inputs.

An argmax agent is an agent that takes a finite set of other agents , and a single embedder , and outputs on turn . is defined as it usually is for logical inductors. Notice that, although may be very difficult to compute, putting it inside an expectation allows a logical inductor to output a decent guess as to what it is anyways. The agents all had to be computable in order for the argmax agent to duplicate their behavior at arbitrary turns. There is a dependence on the set , the embedder , and the logical inductor, but because the logical inductor and will be treated as fixed, we will write the time-t action produced by the argmax agent as .

will denote the "true environment", which is a sequence of values in .

Finally, the logical inductor will be required to have "fast feedback". This means that after each turn, the following information will show up in the deductive process at time .

1: (an interval containing) the true value of .

The deductive process is the sequence of theorems that the logical inductor sees. The reason that the true utility has to be in an interval, instead of reported directly, is because the true utility may have arbitrarily many digits, which hampers the ability of traders to use that information to judge bets on how will turn out. This condition is present to ensure that, if is generated by taking and feeding it a uniform random bitstring, .

Theorem 1: If is generated on each turn by running with a bitstring sampled uniformly at random, then .

We will apply the basic trading strategy from Theorem 4.5.9 in the logical induction paper. In this case, if the left-hand side is overpriced by infinitely often (by a failure of convergence, and the same argument can be applied symmetrically to the right-hand side being overpriced infinitely often), the traders buy a fraction of a share of and sell a fraction of a share of , and keep doing it until 1 share has been moved in total. This takes care of having -generable trade magnitudes. According to the law of large numbers, with probability 1, there is a finite time after which all of the sub-traders have their pile of shares have a value within of , so the initial traders in the efficiently emulatable sequence of traders can be clipped off, and all the other sub-traders get or more value from selling the high-priced expectation and buying the low-priced one, so the necessary conditions for the -ROI Lemma are fulfilled and the resulting trader exploits the market.

Finally, we will define dominance. An agent dominates an agent on an embedder if

This could be thought of as having sublinear regret relative to .

What's ADT?:

The asymptotic decision theory algorithm works as follows.

Inputs: A sequence of numbers asymptotically decreasing to , denoted as , a finite list of embedders , a finite list of agents , and a logical inductor with fast feedback on the true environment,