SamEisenstat — AI Alignment Forum

Answer by SamEisenstatApr 24, 202310

Q5 is true if (as you assumed), the space of lotteries is the space of distributions over a finite set. (For a general convex set, you can get long-line phenomena.)

First, without proof, I'll state the following generalization.

Theorem 1. Let be a relation on a convex space $L$ satisfying axioms A1, A2, A3, and the following additional continuity axiom. For all $A, B_{1}, B_{2}, C \in L$ , the set

{p \in [0, 1] ∣ A ≺ p B_{1} + (1 - p) B_{2} ≺ C}

is open in $[0, 1]$ . Then, there exists a function $u$ from $L$ to the long line such that $u (A) \leq u (B)$ iff $A ⪯ B$ .

The proof is not too different, but simpler, if we also assume A4. In particular, we no longer need the extra continuity axiom, and we get a stronger conclusion. Nate sketched part of the proof of this already, but I want to be clearer about what is stated and skip fewer steps. In particular, I'm not sure how Nate's hypotheses rule out examples that require long-line-valued functions—maybe he's assuming that the domain of the preference relation is a finite-dimensional simplex like I am, but none of his arguments use this explicitly.

Theorem 2. Let $⪯$ be a relation on a finite-dimensional simplex $L = Δ Ω$ satisfying axioms A1-A4. Then, there is a quasiconcave function $u : L \to R$ such that $u (A) \leq u (B)$ iff $A ⪯ B$ .

First, I'll set up some definitions and a lemma. For any lotteries $A$ , $B$ , let $[A, B]$ denote the line segment

{p A + (1 - p) B ∣ p \in [0, 1]} .

We say that preferences are increasing along a line segment $[A, B]$ if whenever $p \leq q$ , we have

(1 - p) A + p B ⪯ (1 - q) A + q B .

We will also use open and half-open interval notation in the corresponding way.

Lemma. Let $⪯$ be a preference relation on a finite-dimensional simplex $L = Δ Ω$ satisfying axioms A1-A4. Then, there are $⪯$ -minimal and -maximal elements in $L$ .

Proof. First, we show that there is a minimal element. Axiom A4 states that for any mixture $C = p A + (1 - p) B$ , either $C ⪰ A$ or $C ⪰ B$ . By induction, it follows more generally that any convex combination C of finitely many elements ${(A_{i})}_{i \in I}$ satisfies $C ⪰ A_{i}$ for some $i \in I$ . But every element is a convex combination of the vertices of $L$ , so some vertex of $L$ is $⪯$ -minimal.

The proof that there is a maximal element is more complex. Consider the family of sets

F = {{B \in L ∣ B ⪰ A} ∣ A \in L} .

This is a prefilter, so since $L$ is compact ( $L$ here carries the Euclidean metric), it has a cluster point $B$ . Either $B$ will be a maximal element, or we will find some other maximal element. In particular, take any $A \in L$ . We are done if $A$ is a maximal element; otherwise, pick $A^{'} ≻ A$ . By the construction of $F$ , for every $n \in N$ , we can pick some $C_{n} ⪰ A^{'}$ within a distance of $\frac{1}{n}$ from B. Now, if we show that $B$ itself satisfies $B ⪰ A$ , it will follow that $B$ is maximal.

The idea is to pass from our sequence ${(C_{n})}_{n \in N}$ , with limit $B$ , to another sequence lying on a line segment with endpoint $B$ . We can use axiom A4, which is a kind of convexity, to control the preference relation on convex combinations of our points $C_{n}$ , so these are the points that we will construct along a line segment. Once we have this line segment, we can finish by using A3, which is a kind of continuity restricted to line segments, to control $B$ itself.

Let $S \subseteq L$ be the set of lotteries in the affine span of the set ${C_{n}}_{n \in N}$ . Then, if we take some index set $I \subseteq N$ such that ${(C_{n})}_{n \in I}$ is a maximal affinely independent tuple, it follows that ${C_{n}}_{n \in I}$ affinely generates $S$ . Hence, the convex combination

D = \sum n \in I \frac{1}{| I |} C_{n},

i.e. the barycenter of the simplex with vertices at ${(C_{n})}_{n \in I}$ , is in the interior of the convex hull of ${C_{n}}_{n \in I}$ relative to $S$ , so we can pick some $r > 0$ such that the $r$ -ball around $D$ relative to $S$ is contained in this simplex.

Now, we will see that every lottery $E$ in the set $(B, D]$ satisfies $E ⪰ A^{'}$ . For any $ε > 0$ , pick $k$ so that $C_{k}$ is in the $ε$ -ball around $B$ . Since the tangent vector $v = B - C_{k}$ has length less than $ε$ , the lottery

F = D + \frac{r}{ε} (B - C_{k})

is in the $r$ -ball around $D$ , and it is in $S$ , so it is in the simplex with vertices ${(C_{n})}_{n \in I}$ . Then, $F ⪰ A^{'}$ by A4, and $C_{k} ⪰ A^{'}$ by hypothesis. So, applying A4 again,

A^{'} ⪯ \frac{r}{r + ε} C_{k} + \frac{ε}{r + ε} F = \frac{r}{r + ε} B + \frac{ε}{r + ε} D .

Using A4 one more time, it follows that every lottery

E \in [\frac{r}{r + ε} B + \frac{ε}{r + ε} D, D]

satisfies $E ⪰ A^{'}$ , and hence every lottery $E \in (B, D]$ .

Now we can finish up. If $B ≺ A$ then, using A3 and the fact that $D ⪰ A^{'} ≻ A$ , there would have to be some lottery in $[B, D]$ that is $⪯$ -equivalent to A, but this would contradict what we just concluded. So, $B ⪰ A$ , and so B is $⪯$ -maximal. $□$

Proof of Theorem 2. Let $C$ be a $⪯$ -minimal and $D$ a $⪯$ -maximal element of $L$ . First, we will see that preferences are increasing on $[C, D]$ , and then we will use this fact to construct a function $L \to R$ and show that it has the desired properties. Suppose preferences we not increasing; then, there would be $A, B \in [C, D]$ such that $A$ is closer to $C$ while $B$ is closer to $D$ , and $A ≻ B$ . Then, $B$ would be a convex combination of $A$ and $D$ , but $B ≺ A ⪯ D$ by the maximality of $D$ , contradicting A4.

Now we can construct our utility function $u : L \to R$ using A3; for each $\sim$ -class $[A]$ , we have $C ⪯ A ⪯ D$ , so there is some^[1] $p \in [0, 1]$ such that

(1 - p) C + p D \sim A .

Then, let $u (A^{'}) = p$ for all $A^{'} \in [A]$ . Since preferences are increasing on $[C, D]$ , it is immediate that if $u (A) \leq u (B)$ , then $A ⪯ B$ . Conversely, if $A ⪯ B$ , we have two cases. If $A ≺ B$ , then $B ⋠ A$ , so $u (B) ≰ u (A)$ , and so $u (A) \leq u (B)$ . Finally, if $A \sim B$ , then $u (A) = u (B)$ by construction.

Finally, since for all $A, B \in L$ we have $u (A) \leq u (B)$ iff $A ⪯ B$ , it follows immediately that $u$ is quasiconcave by A4. $□$

^{^}
Nate mentions using choice in his answer, but here at least the use of choice is removable. Since $⪯$ is monotone on $[C, D]$ , the intersection of the $\sim$ -class $[A]$ with $[C, D]$ is a subinterval of $[C, D]$ , so we can pick $p$ based on the midpoint of that interval

Modal Fixpoint Cooperation without Löb's Theorem

SamEisenstat3y70

Nice, I like this proof also. Maybe there's a clearer way to say thing, but your "unrolling one step" corresponds to my going from to $v$ . We somehow need to "look two possible worlds deep".

Modal Fixpoint Cooperation without Löb's Theorem

SamEisenstat3y1610

Here's a simple Kripke frame proof of Payor's lemma.

Let be a Kripke frame over our language, i.e. $W$ is a set of possible worlds, $R$ is an accessibility relation, and $⊩$ judges that a sentence holds in a world. Now, suppose for contradiction that $W ⊩ x \leftrightarrow □ (□ x \to x)$ but that $W ⊮ x$ , i.e. $x$ does not hold in some world $u \in W$ .

A bit of De Morganing tells us that the hypothesis on $x$ is equivalent to $\neg x \leftrightarrow ◊ (□ x \land \neg x)$ , so $u ⊩ ◊ (□ x \land \neg x)$ . So, there is some world $v$ with $u R v$ such that $v ⊩ □ x \land \neg x$ . But again looking at our equivalent form for $\neg x$ , we see that $W ⊩ \neg x \to ◊ \neg x$ , so $v ⊩ □ x \land ◊ \neg x$ , a contradiction. $□$

Both this proof and the proof in the post are very simple, but at least for me I feel like this proof tells me a bit more about what's going on, or at least tells me something about what's going on that the other doesn't. Though in a broader sense there's a lot I don't know about what's going on in modal fixed points.

Kripke frame-style semantics are helpful for thinking about lots of modal logic things. In particular, there are cool inductiony interpretations of the Gödel/Löb theorems. These are more complicated, but I'd be happy to talk about them sometime.

Alignment proposals and complexity classes

SamEisenstat5y50

Theorem. Weak HCH (and similar proposals) contain EXP.

Proof sketch: I give a strategy that H can follow to determine whether some machine that runs in $O (2^{c n^{k}})$ time accepts. Basically, we need to answer questions of the form "Does cell $x$ have value $y$ at time $t$ ." and "Was the head in position $x$ at time $t$ ?", where $x$ and $t$ are bounded by some function in $O (2^{c n^{k}})$ . Using place-system representations of $x$ and $t$ , these questions have length in $O (n^{k})$ , so they can be asked. Further, each question is a simple function of a constant number of other such questions about earlier times as long as $t > 0$ , and can be answered directly in the base case $t = 0$ .

A Loophole for Self-Applicative Soundness

SamEisenstat7y10

As you say, this isn't a proof, but it wouldn't be too surprising if this were consistent. There is some $k \in N$ such that $□_{n} ϕ \to ϕ$ has a proof of length $n^{k}$ by a result of Pavel Pudlák (On the length of proofs of finitistic consistency statements in first order theories). Here I'm making the dependence on $n$ explicit, but not the dependence on $ϕ$ . I haven't looked at it closely, but the proof strategy in Theorems 5.4 and 5.5 suggests that $k$ will not depend on $ϕ$ , as long as we only ask for the weaker property that $□_{n} ϕ \to ϕ$ will only be provable in length $n^{k}$ for sentences $ϕ$ of length at most $k$ .

A Loophole for Self-Applicative Soundness

SamEisenstat7y00

I misunderstood your proposal, but you don't need to do this work to get what you want. You can just take each sentence $□_{n} ϕ \to ϕ$ as an axiom, but declare that this axiom takes $n$ symbols to invoke. This could be done by changing the notion of length of a proof, or by taking axioms $ψ_{ϕ, n} \to (□_{n} ϕ \to ϕ)$ and $ψ_{ϕ, n}$ with $ψ_{ϕ, n}$ very long.

Open question: are minimal circuits daemon-free?

SamEisenstat7y70

I'm having trouble thinking about what it would mean for a circuit to contain daemons such that we could hope for a proof. It would be nice if we could find a simple such definition, but it seems hard to make this intuition precise.

For example, we might say that a circuit contains daemons if it displays more optimization that necessary to solve a problem. Minimal circuits could have daemons under this definition though. Suppose that some function $f$ describes the behaviour of some powerful agent, a function $~ f$ is like $f$ with noise added, and our problem is to predict sufficiently well the function $~ f$ . Then, the simplest circuit that does well won't bother to memorize a bunch of noise, so it will pursue the goals of the agent described by $f$ more efficiently than $~ f$ , and thus more efficiently than necessary.

No Constant Distribution Can be a Logical Inductor

SamEisenstat8y10

Two minor comments. First, the bitstrings that you use do not all correspond to worlds, since, for example, $C o n (P A + C o n (P A))$ implies $C o n (P A)$ , as $P A$ is a subtheory of $P A + C o n (P A)$ . This can be fixed by instead using a tree of sentences that all diagonalize against themselves. Tsvi and I used a construction in this spirit in A limit-computable, self-reflective distribution, for example.

Second, I believe that weakening #2 in this post also cannot be satisfied by any constant distribution. To sketch my reasoning, a trader can try to buy a sequence of sentences $ϕ_{1}, ϕ_{1} \land ϕ_{2}, \dots$ , spending $$2^{-n}$ on the $n$th sentence $\phi_1 \wedge \dots \wedge \phi_n$. It should choose the sequence of sentences so that $\phi_1 \wedge \dots \wedge \phi_n$ has probability at most $2^{-n}$, and then it will make an infinite amount of money if the sentences are simultaneously true.

The way to do this is to choose each $ϕ_{n}$ from a list of all sentences. If at any point you notice that $ϕ_{1} \land \dots \land ϕ_{n}$ has too high a probability, then pick a new sentence for $ϕ_{n}$ . We can sell all the conjunctions $ϕ_{1} \land \dots \land ϕ_{k}$ for $k \geq n$ and get back the original amount payed by hypothesis. Then, if we can keep using sharper continuous tests of the probabilities of the sentences $ϕ_{1} \land \dots \land ϕ_{n}$ over time, we will settle down to a sequence with the desired property.

In order to turn this sketch into a proof, we need to be more careful about how these things are to be made continuous, but it seems routine.

An Untrollable Mathematician

SamEisenstat8y30

I at first didn't understand your argument for claim (2), so I wrote an alternate proof that's a bit more obvious/careful. I now see why it works, but I'll give my version below for anyone interested. In any case, what you really mean is the probability of deciding a sentence outside of $Φ$ by having it announced by nature; there may be a high probability of sentences being decided indirectly via sentences in $Φ$ .

Instead of choosing $Φ$ as you describe, pick $Φ$ so that the probability $μ (Φ)$ of sampling something in $Φ$ is greater than $1 - μ (ψ) \cdot ε / 2$ . Then, the probability of sampling something in $Φ - {ψ}$ is at least $1 - μ (ψ) \cdot (1 + ε / 2)$ . Hence, no matter what sentences have been decided already, the probability that repeatedly sampling from $μ$ selects $ψ$ before it selects any sentence outside of $Φ$ is at least

$\begin{matrix} \infty \sum k = 0 (1 - μ (ψ) \cdot (1 + ε / 2))^{k} \cdot μ (ψ) & = \frac{μ (ψ)}{μ (ψ) \cdot (1 + ε / 2)} > 1 - ε / 2 \end{matrix}$

as desired.

Furthermore, this argument makes it clear that the probability distribution we converge to depends only on the set of sentences which the environment will eventually assert, not on their ordering!

Oh, I didn't notice that aspect of things. That's pretty cool.

Some Criticisms of the Logical Induction paper

SamEisenstat8y00

A few thoughts:

I agree that the LI criterion is "pointwise" in the way that you describe, but I think that this is both pretty good and as much as could actually be asked. A single efficiently computable trader can do a lot. It can enforce coherence on a polynomially growing set of sentences, search for proofs using many different proof strategies, enforce a polynomially growing set of statistical patterns, enforce reflection properties on a polynomially large set of sentences, etc. So, eventually the market will not be exploitable on all these things simultaneously, which seems like a pretty good level of accurate beliefs to have.

On the other side of things, it would be far to strong to ask for a uniform bound of the form "for every $ε > 0$ , there is some day $t$ such that after step $t$ , no trader can multiply its wealth by a factor more than $1 + ε$ ". This is because a trader can be hardcoded with arbitrarily many hard-to-compute facts. For every $δ$ , there must eventually be a day $t^{'} > t$ on which the belief of your logical inductor assign probability less than $δ$ to some true statement, at which point a trader who has that statement hardcoded can multiply its wealth by $1 / δ$ . (I can give a construction of such a sentence using self-reference if you want, but it's also intuitively natural - just pick many mutually exclusive statements with nothing to break the symmetry.)

Thus, I wouldn't think of traders as "mistakes", as you do in the post. A trader can gain money on the market if the market doesn't already know all facts that will be listed by the deductive process, but that is a very high bar. Doing well against finitely many traders is already "pretty good".

What you can ask for regarding uniformity is for some simple function $f$ such that any trader $T$ can multiply its wealth by at most a factor $f (T)$ . This is basically the idea of the mistake bound model in learning theory; you bound how many mistakes happen rather than when they happen. This would let you say a more than the one-trader properties I mentioned in my first paragraph. In fact, $L I A$ has this propery; $f (T)$ is just the initial wealth of the trader. You may therefore want to do something like setting traders' initial wealths according to some measure of complexity. Admittedly this isn't made explicit in the paper, but there's not much additional that needs to be done to think in this way; it's just the combination of the individual proofs in the paper with the explicit bounds you get from the initial wealths of the traders involved.

I basically agree completely on your last few points. The traders are a model class, not an ensemble method in any substantive way, and it is just confusing to connect them to the papers on ensemble methods that the LI paper references. Also, while I use the idea of logical induction to do research that I hope will be relevant to practical algorithms, it seems unlikely than any practical algorithm will look much like a LI. For one thing, finding fixed points is really hard without some property stronger than continuity, and you'd need a pretty good reason to put it in the inner loop of anything.

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

Posts

Wikitag Contributions

Comments