*Note: This describes an idea of Jessica Taylor's*.

# Introduction

An important aim in safe AI development is to make systems whose submodules analyze how they might go wrong. One model for this in academic machine learning research is training on adversarial examples.

In order to better understand the adversarial training process, we outline a specific adversarial learning model and consider what kind of equilibrium state it might possess.

# Setup

Consider a general adversarial learning scenario described by the tuple where:

- is a set of examples
- is a set of the labels for these examples
- is a non-empty, compact, and convex set of hypotheses that can be chosen by the learner
- is a continuous mapping that nature uses to probabilistically assign ground truth labels to each example ( represents a probability distribution over , )
- is a continuous function that adversarially maps a hypothesis to some probability distribution over examples, allocating more probability mass to examples that will give the hypothesis difficulty.
- , is a general loss function that appraises the performance of some hypothesis on some labeled example. It is convex in its third argument.

The learner's aim is to choose a hypothesis that achieves low expected loss , given that the labeled example is sampled from an adversarial distribution, .

# An equilibrium in expert selection

One plausible approach for the learner is to consider hypotheses that perform well on some adversarial distribution . The hypotheses that perform best on a some fixed distribution are given by the set-valued function:

The difficulty with choosing a hypothesis is that the adversarial distribution is not fixed, but rather it varies according to which hypothesis is chosen. One response is to choose some hypothesis that is located at a fixed point, such that . There is always at least one such fixed point, , and this can be shown by Kakutani's fixed-point theorem.

To apply Kakutani's fixed point theorem, we need to show that: i) the domain of is compact, convex and nonempty, ii) is convex-valued, and iii) is compact-valued and upper hemicontinuous.

First, by its definition, , the domain of , is non-empty, closed, and compact. Second, is convex-valued because it minimizes a convex function. Remember that is convex in . The expectation of is then a non-negative weighted combination of functions convex in , so it is convex. Hence the values that minimize are some convex set. Finally, is compact-valued and upper hemicontinuous by Berge's maximum theorem. This is because has a compact domain and outputs the arguments that minimize a continuous function. So by Kakutani's fixed point theorem, will have at least one fixed point, .

# Discussion

The motivation for choosing a hypothesis that lies at a fixed point is that this hypothesis need not be changed in response to the examples selected by an adversary. We should be clear though that this does not mean we can design an algorithm with this approach. Locating these fixed points would be computationally expensive and require lots of data in the general case. One open question is under what circumstances we will be able to find such an equilibrium efficiently.

Note that the equilibrium weight vectors do not necessarily minimize loss over all possible weight vectors; it might be possible to get a lower loss by selecting a non-equilibrium weight vector (on which the adversary will produce easier examples). In some applications, it will be desirable to meet the equilibrium condition, whereas in others, it will be sufficient to have no regret compared to alternative weight vectors.

It is worth noting that we can arrive at the same result by thinking of adversarial supervised learning as a continuous two player game. This game has two players: the learner and the adversary. The learner chooses from the strategies in order to minimize the expectation of over and , and the adversary chooses some distribution over some compact set of examples in order to maximize some general objective function. The Nash equilibria of such a game correspond to the fixed points in our supervised learning setup. It follows that (at least) one fixed point will always exist. Of course, like the fixed points in adversarial supervised learning, Nash equilibria can be difficult to compute, and will not always give any particular agent an optimal score.

In future analysis, we may be able to think of other properties that the adversarial learning setup inherits from two player games.

It would be nice to prove theorems analogous to standard PAC learning theory in this setting. For example, suppose there are distributions D1,D2…Dn on X that the adversary can choose from. Our algorithm is given a sample of Di⋉g for each i and should choose a hypothesis. We can then try to prove that, given finite VC dimension, playing a minimax strategy of the game resulting from thinking of our samples as the entire distributions is approximately minimax for the real game.

Also, computing Nash equilibria in zero-sum two-player games is easy using linear programming, as a direct consequence of the minimax theorem.