this work was done by Tamsin Leake and Julia Persson at Orthogonal.
thanks to mesaoptimizer for his help putting together this post.

what does the QACI plan for formal-goal alignment actually look like when formalized as math? in this post, we'll be presenting our current formalization, which we believe has most critical details filled in.

1. math constructs

in this first part, we'll be defining a collection of mathematical constructs which we'll be using in the rest of the post.

1.1. basic set theory

we'll be assuming basic set theory notation; in particular, $A \times B \times C$ is the set of tuples whose elements are respectively members of the sets $A$ , $B$ , and $C$ , and for $n \in N$ , $S^{n}$ is the set of tuples of $n$ elements, all members of $S$ .

$B = {⊤, ⊥}$ is the set of booleans and $N$ is the set of natural numbers including $0$ .

given a set $X$ , $P (X)$ will be the set of subsets of $X$ .

$# S$ is the cardinality (number of different elements) in set $S$ .

for some set $X$ and some complete ordering $< \in X^{2} \to B$ , ${min}_{<}$ and ${max}_{<}$ are two functions of type $P (X) ∖ {\emptyset} \to X$ finding the respective minimum and maximum element of non-empty sets when they exist, using $<$ as an ordering.

1.2. functions and programs

if $n \in N$ , then we'll denote $f \circ^{n}$ as repeated composition of $f$ : $f \circ \dots \circ f$ ( $n$ times), with $\circ$ being the composition operator: $(f \circ g) (x) = f (g (x))$ .

$λ x : X . B$ is an anonymous function defined over set $X$ , whose parameter $x$ is bound to its argument in its body $B$ when it is called.

$A \to B$ is the set of functions from $A$ to $B$ , with $\to$ being right-associative ( $A \to B \to C$ is $A \to (B \to C)$ ). if $f \in A \to B \to C$ , then $f (x) (y)$ is simply $f$ applied once to $x \in A$ , and then the resulting function of type $B \to C$ being applied to $y \in B$ . $A \to B$ is sometimes denoted $B^{A}$ in set theory.

$A H \to B$ is the set of always-halting, always-succeeding, deterministic programs taking as input an $A$ and returning a $B$ .

given $f \in A H \to B$ and $x \in A$ , $R (f, x) \in N ∖ {0}$ is the runtime duration of executing $f$ with input $x$ , measured in compute steps doing a constant amount of work each — such as turing machine updates.

1.3. sum notation

i'll be using a syntax for sums $\sum$ in which the sum iterates over all possibles values for the variables listed above it, given that the constraints below it hold.

$\begin{matrix} x, y \sum y & = 1 y = x mod 2 x \in {1, 2, 3, 4} x \leq 2 \end{matrix}$

says "for any value of $x$ and $y$ where these three constraints hold, sum $y$ ".

1.4. distributions

for any countable set $X$ , the set of distributions over $X$ is defined as:

$Δ_{X} ≔ {f | f \in X \to [0; 1], \begin{matrix} x \sum x \in X \end{matrix} f (x) \leq 1}$

a functio...

Posts

Wikitag Contributions

Comments