harfe - AI Alignment Forum

The Learning-Theoretic Agenda: Status 2023

Regarding direction 17: There might be some potential drawbacks to ADAM. I think its possible that some very agentic programs have relatively low score. This is due to explicit optimization algorithms being low complexity.

(Disclaimer: the following argument is not a proof, and appeals to some heuristics/etc. We fix $M = M_{0}$ for these considerations too.) Consider an utility function $^U$ . Further, consider a computable approximation of the optimal policy (AIXI that explicitly optimizes for $^U$ ) and has an approximation parameter n (this could be AIXI-tl, plus some approximation of $^U$ ; higher $n$ is better approximation). We will call this approximation of the optimal policy $π_{n}^{^U}$ . This approximation algorithm has complexity $K (π_{n}^{^U}) = C + K (^U) + K (n)$ , where $C > 0$ is a constant needed to describe the general algorithm (this should not be too large).

We can get better approximation by using a quickly growing function, such as the Ackermann function with $n = A (k, k)$ . Then we have $K (π_{A (k, k)}^{^U}) = C + K (^U) + K (A (k, k)) \leq C + K (^U) + log (k)$ .

What is the $g$ score of this policy? We have $g (π_{A (k, k)}^{^U}) = {max}_{U} ({min}_{π^{'} : \dots} K (π^{'}) - K (U))$ . Let $¯ U$ be maximal in this expression. If $K (¯ U) \geq K (^U) - C$ , then $g (π_{A (k, k)}^{^U}) = min π^{'} : E_{ζ_{M_{0}} π^{'}} (¯ U) \geq E_{ζ_{M_{0}} π_{A (k, k)}^{^U}} (¯ U) K (π^{'}) - K (¯ U) \leq K (π_{A (k, k)}^{^U}) - K (^U) + C \leq 2 C log (k)$ .

For the other case, let us assume that if $K (¯ U) < K (^U) - C$ , the policy $π_{A (k, k)}^{¯ U}$ is at least as good at maximizing $¯ U$ than $π_{A (k, k)}^{^U)}$ . Then, we have $g (π_{A (k, k)}^{^U}) = min π^{'} : E_{ζ_{M_{0}} π^{'}} (¯ U) \geq E_{ζ_{M_{0}} π_{A (k, k)}^{^U}} (¯ U) K (π^{'}) - K (¯ U) \leq K (π_{A (k, k)}^{¯ U}) - K (¯ U)) \leq C + log (k)$ .

I don't think that the assumption ( $(π_{A (k, k)}^{¯ U}$ maximizes $b a r U$ better than $(π_{A (k, k)}^{^U}$ ) is true for all $^U$ and $k$ , but plausibly we can select $^U$ such that this is the case (exceptions, if they exist, would be a bit weird, and if ADAM working well due to these weird exceptions feels a bit disappointing to me). A thing that is not captured by approximations such as AIXI-tl are programs that halt but have insane runtime (longer than $A (k, k)$ ). Again, it would feel weird to me if ADAM sort of works because of low-complexity extremely-long-running halting programs.

To summarize, maybe there exist policies $π_{A (k, k)}^{^U}$ which strongly optimize a non-trivial utility function $^U$ with approximation parameter $n = A (k, k)$ , but where $g (π_{A (k, k)}^{^U}) \leq 2 C + log (k)$ is relatively small.

Buck's Shortform

harfe6mo20

I am going to assume that in the code, when calculating p_alice_win_given_not_caught, we do not divide the term by two (since this is not that consistent with the description. I am also assuming that $0.25 - t_{2}$ is a typo and $0.5 - t_{2}$ is meant, which would also be more consistent with other stuff). So I am going to assume assume a symmetrical version.

Here, P(Alice wins) is $e^{- λ_{1} t_{1} - λ_{2} t_{2}} - e^{- λ_{1} / 2 - λ_{2} / 2} =: f (λ_{1}, λ_{2}, t_{1}, t_{2})$ . Wlog we can assume $0 < t < 1$ (otherwise Bob will run everything or nothing in shielded mode).

We claim that $((λ_{1}^{*}, λ_{2}^{*}), (t_{1}^{*}, t_{2}^{*})) := ((μ^{*}, μ^{*}), (t / 2, t / 2)$ is a (pure) Nash equilibrium, where $μ^{*} = log t / (t - 1)$ .

To verify, lets first show that Alice cannot make a better choice if Bob plays $(t_{1}^{*}, t_{2}^{*}) = (t / 2, t / 2)$ . We have $f (λ_{1}, λ_{2}, t_{1}^{*}, t_{2}^{*}) = e^{- (λ_{1} + λ_{2}) t / 2} - e^{- (λ_{1} + λ_{2}) / 2}$ . Since this only depends on the sum, we can make the substitution $μ = (λ_{1} + λ_{2}) / 2$ . Thus, we want to maximize $~ f (μ, t) = e^{- μ t} - e^{- μ}$ . We have ${~ f}^{'} (μ, t) = - t e^{- μ t} + e^{- μ} = 0$ . Rearranging, we get $t e^{- μ t} = e^{- μ}$ . Taking logs, we get $log t - μ t = - μ$ . Rearranging, we get $μ = log t / (t - 1)$ . Thus, $μ = μ^{*}$ is the optimal choice. This means, that if Bob sticks to his strategy, Alice cannot do better than $f (μ^{*}, μ^{*}, t / 2, t / 2)$ .

Now, lets show that Bob cannot do better. We have $f (λ_{1}^{*}, λ_{2}^{*}, t_{1}, t_{2}) = e^{- μ^{*} t_{1} - μ^{*} t_{2}} - e^{- μ^{*} / 2 - μ^{*} / 2} = e^{- μ^{*} t} - e^{- μ^{*}}$ . This does not depend on $t_{1}$ and $t_{2}$ anymore, so any choice of $t_{1}$ and $t_{2}$ is optimal if Alice plays $(μ^{*}, μ^{*})$ .

(If I picked the wrong version of the question, and you actually want some symmetry: I suspect that the solution will have similarities, or that in some cases the solution can be obtained by rescaling the problem back into a more symmetric form.)

Thoughts on hardware / compute requirements for AGI

harfe1y31

Nanotech industry-rebuilding comes earlier than von Neumann level? I doubt that. A lot of existing people are close to von Neumann level.

Maybe your argument is that there will be so many AGIs, that they can do Nanotech industry rebuilding while individually being very dumb. But I would then argue that the collective already exceeds von Neumann or large groups of humans in intelligence.

AI ALIGNMENT FORUM
AF

Posts

Wiki Contributions

Comments