October 2019

Frontpage Posts

Personal Blogposts

ShortformLoad More (5/14)

6Matthew "Vaniver" Graves8moI've been thinking a lot about 'parallel economies' recently. One of the main
differences between 'slow takeoff' and 'fast takeoff' predictions is whether AI
is integrated into the 'human civilization' economy or constructing a separate
'AI civilization' economy. Maybe it's worth explaining a bit more what I mean by
this: you can think of 'economies' as collections of agents who trade with each
other. Often it will have a hierarchical structure, and where we draw the lines
are sort of arbitrary. Imagine a person who works at a company and participates
in its internal economy, and the company participates in national and global
economies, and the person participates in those economies as well. A better
picture has a very dense graph with lots of nodes and links between groups of
nodes whose heaviness depends on the number of links between nodes in those
groups.
As Adam Smith argues, the ability of an economy to support specialization of
labor depends on its size. If you have an island with a single inhabitant, it
doesn't make sense to fully employ a farmer (since a full-time farmer can
generate much more food than a single person could eat), for a village with 100
inhabitants it doesn't make sense to farm more than would feed a hundred mouths,
and so on. But as you make more and more of a product, investments that have a
small multiplicative payoff become better and better, to the point that a planet
with ten billion people will have massive investment in farming specialization
that make it vastly more efficient per unit than the village farming system. So
for much of history, increased wealth has been driven by this increased
specialization of labor, which was driven by the increased size of the economy
(both through population growth and decreased trade barriers widening the links
between economies until they effectively became one economy).
One reason to think economies will remain integrated is because increased size
benefits all actors in the economy on net; a

98moGame theory is widely considered the correct description of rational behavior in
multi-agent scenarios. However, real world agents have to learn, whereas game
theory assumes perfect knowledge, which can be only achieved in the limit at
best. Bridging this gap requires using multi-agent learning theory to justify
game theory, a problem that is mostly open (but some results exist). In
particular, we would like to prove that learning agents converge to game
theoretic solutions such as Nash equilibria (putting superrationality aside: I
think that superrationality should manifest via modifying the game rather than
abandoning the notion of Nash equilibrium).
The simplest setup in (non-cooperative) game theory is normal form games.
Learning happens by accumulating evidence over time, so a normal form game is
not, in itself, a meaningful setting for learning. One way to solve this is
replacing the normal form game by a repeated version. This, however, requires
deciding on a time discount. For sufficiently steep time discounts, the repeated
game is essentially equivalent to the normal form game (from the perspective of
game theory). However, the full-fledged theory of intelligent agents requires
considering shallow time discounts, otherwise there is no notion of long-term
planning. For shallow time discounts, the game theory of a repeated game is very
different from the game theory of the original normal form game. In fact, the
folk theorem asserts that any payoff vector above the maximin of each player is
a possible Nash payoff. So, proving convergence to a Nash equilibrium amounts
(more or less) to proving converges to at least the maximin payoff. This is
possible using incomplete models
[https://www.alignmentforum.org/posts/5bd75cc58225bf0670375575/the-learning-theoretic-ai-alignment-research-agenda]
, but doesn't seem very interesting: to receive the maximin payoff, the agents
only have to learn the rules of the game, they need not learn the reward
functions of the othe

58moOne challenge for theories of embedded agency over Cartesian theories is that
the 'true dynamics' of optimization (where a function defined over a space
points to a single global maximum, possibly achieved by multiple inputs) are
replaced by the 'approximate dynamics'. But this means that by default we get
the hassles associated with numerical approximations, like when integrating
differential equations. If you tell me that you're doing Euler's Method on a
particular system, I need to know lots about the system and about the particular
hyperparameters you're using to know how well you'll approximate the true
solution. This is the toy version of trying to figure out how a human reasons
through a complicated cognitive task; you would need to know lots of details
about the 'hyperparameters' of their process to replicate their final result.
This makes getting guarantees hard. We might be able to establish what the
'sensible' solution range for a problem is, but establishing what algorithms can
generate sensible solutions under what parameter settings seems much harder.
Imagine trying to express what the set of deep neural network parameters are
that will perform acceptably well on a particular task (first for a particular
architecture, and then across all architectures!).

69moThis is preliminary description of what I dubbed Dialogic Reinforcement Learning
(credit for the name goes to tumblr user @di--es---can-ic-ul-ar--es): the
alignment scheme I currently find most promising.
It seems that the natural formal criterion for alignment (or at least the main
criterion) is having a "subjective regret bound": that is, the AI has to
converge (in the long term planning limit, γ→1 limit) to achieving optimal
expected user!utility with respect to the knowledge state of the user. In order
to achieve this, we need to establish a communication protocol between the AI
and the user that will allow transmitting this knowledge state to the AI
(including knowledge about the user's values). Dialogic RL attacks this problem
in the manner which seems the most straightforward and powerful: allowing the AI
to ask the user questions in some highly expressive formal language, which we
will denote F.
F allows making formal statements about a formal model M of the world, as seen
from the AI's perspective. M includes such elements as observations, actions,
rewards and corruption. That is, M reflects (i) the dynamics of the environment
(ii) the values of the user (iii) processes that either manipulate the user, or
damage the ability to obtain reliable information from the user. Here, we can
use different models of values: a traditional "perceptible" reward function, an
instrumental reward function
[https://www.alignmentforum.org/posts/aAzApjEpdYwAxnsAS/reinforcement-learning-with-imperceptible-rewards]
, a semi-instrumental reward functions, dynamically-inconsistent rewards
[https://www.alignmentforum.org/posts/aPwNaiSLjYP4XXZQW/ai-alignment-open-thread-august-2019#C9gRtMRc6qLv7J6k7]
, rewards with Knightian uncertainty etc. Moreover, the setup is
self-referential in the sense that, M also reflects the question-answer
interface and the user's behavior.
A single question can consist, for example, of asking for the probability of
some sentence in F or the expected

48moI recently realized that the formalism of incomplete models
[https://www.alignmentforum.org/posts/5bd75cc58225bf0670375575/the-learning-theoretic-ai-alignment-research-agenda]
provides a rather natural solution to all decision theory problems involving
"Omega" (something that predicts the agent's decisions). An incomplete
hypothesis may be thought of a zero-sum game between the agent and an imaginary
opponent (we will call the opponent "Murphy" as in Murphy's law). If we assume
that the agent cannot randomize against Omega, we need to use the deterministic
version of the formalism. That is, an agent that learns an incomplete hypothesis
converges to the corresponding maximin value in pure strategies. (The stochastic
version can be regarded as a special case of the deterministic version where the
agent has access to an external random number generator that is hidden from the
rest of the environment according to the hypothesis.) To every decision problem,
we can now correspond an incomplete hypothesis as follows. Every time Omega
makes a prediction about the agent's future action in some counterfactual, we
have Murphy make a guess instead. This guess cannot be directly observed by the
agent. If the relevant counterfactual is realized, then the agent's action
renders the guess false or true. If the guess is false, the agent receives
infinite (or, sufficiently large) reward. If the guess is true, everything
proceeds as usual. The maximin value then corresponds to the scenario where the
guess is true and the agent behaves as if its action controls the guess. (Which
is exactly what FDT and its variants try to achieve.)
For example, consider (repeated) counterfactual mugging. The incomplete
hypothesis is a partially observable stochastic game (between the agent and
Murphy), with the following states:
* s0: initial state. Murphy has two actions: g+ (guess the agent will pay),
transitioning to s1+ and g− (guess the agent won't pay) transitioning to s1−.
(Reward = 0