AlexMennen's Shortform

Theorem: Fuzzy beliefs (as in https://www.alignmentforum.org/posts/Ajcq9xWi2fmgn8RBJ/the-credit-assignment-problem#X6fFvAHkxCPmQYB6v ) form a continuous DCPO. (At least I'm pretty sure this is true. I've only given proof sketches so far)

The relevant definitions:

A fuzzy belief over a set $X$ is a concave function $ϕ : Δ X \to [0, 1]$ such that $sup (ϕ) = 1$ (where $Δ X$ is the space of probability distributions on $X$ ). Fuzzy beliefs are partially ordered by $ϕ \leq ψ ⟺ \forall μ \in Δ X : ϕ (μ) \geq ψ (μ)$ . The inequalities reverse because we want to think of "more specific"/"less fuzzy" beliefs as "greater", and these are the functions with lower values; the most specific/least fuzzy beliefs are ordinary probability distributions, which are represented as the concave hull of the function assigning 1 to that probability distribution and 0 to all others; these should be the maximal fuzzy beliefs. Note that, because of the order-reversal, the supremum of a set of functions refers to their pointwise infimum.

A DCPO (directed-complete partial order) is a partial order in which every directed subset has a supremum.

In a DCPO, define $x << y$ to mean that for every directed set $D$ with $sup D \geq y$ , $\exists d \in D$ such that $d \geq x$ . A DCPO is continuous if for every $y$ , $y = sup {x ∣ x << y}$ .

Lemma: Fuzzy beliefs are a DCPO.

Proof sketch: Given a directed set $D$ , $(sup D) (μ) = min {d (μ) ∣ d \in D}$ is convex, and ${μ ∣ (sup D) (μ) = 1} = ⋂_{d \in D} {μ ∣ d (μ) = 1}$ . Each of the sets in that intersection are non-empty, hence so are finite intersections of them since $D$ is directed, and hence so is the whole intersection since $Δ X$ is compact.

Lemma: $ϕ << ψ$ iff ${μ ∣ ψ (μ) = 1}$ is contained in the interior of ${μ ∣ ϕ (μ) = 1}$ and for every $μ$ such that $ψ (μ) \neq 1$ , $ϕ (μ) > ψ (μ)$ .

Proof sketch: If $sup D \geq ψ$ , then $⋂_{d \in D} {μ ∣ d (μ) = 1} \subseteq {μ ∣ ψ (μ) = 1}$ , so by compactness of $Δ X$ and directedness of $D$ , there should be $d \in D$ such that ${μ ∣ d (μ) = 1} \subseteq int ({μ ∣ ϕ (μ) = 1})$ . Similarly, for each $μ$ such that $ψ (μ) \neq 1$ , there should be $d_{μ} \in D$ such that $d_{μ} (μ) < ϕ (μ)$ . By compactness, there should be some finite subset of ${d} \cup {d_{μ} ∣ ψ (μ) \neq 1}$ such that any upper bound for all of them is at least $ϕ$ .

Lemma: $ψ = sup {ϕ ∣ {μ ∣ ψ (μ) = 1} \subseteq int {μ ∣ ϕ (μ) = 1}, \forall μ ψ (μ) \neq 1 \to ϕ (μ) > ψ (μ)}$ .

Proof: clear?

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

2