Vanessa Kosoy

AI alignment researcher supported by MIRI and LTFF. Working on the learning-theoretic agenda.

Random factoids about Vanessa:

  • [insert typical rationalist / math nerd stuff here]
  • Loves fantasy and scifi, but romance novels especially
  • Rarely reads non-fiction books except for textbooks
  • Divorced with two children and a cat
  • Doesn't eat vertebrates or vertebrate products
  • Polyamorous and bisexual
  • Has a weakness for pretty clothes. Also for tasty food
  • Speaks Hebrew, Russian and English. Has been trying to learn French


Vanessa Kosoy's Shortform

Actually the Schwartz–Zippel algorithm can easily be adapted to this case (just imagine that types are variables over , and start from testing the identity of the types appearing inside parentheses), so we can validate expressions in randomized polynomial time (and, given standard conjectures, in deterministic polynomial time as well).

Vanessa Kosoy's Shortform

Let's also explicitly describe 0th order and 1st order infra-Bayesian logic (although they are ofc segments of higher-order).

0-th order


Let be the set of propositional variables. We define the language :

  • Any is also in
  • Given ,
  • Given ,

Notice there's no negation or implication. We define the set of judgements . We write judgements as (" in the context of "). A theory is a subset of .


Given , a model of consists of a compact Polish space and a mapping . The latter is required to satisfy:

  • . Here, we define of infradistributions as intersection of the corresponding sets
  • . Here, we define of infradistributions as convex hull of the corresponding sets
  • For any ,

1-st order


We define the language using the usual syntax of 1-st order logic, where the allowed operators are , and the quantifier . Variables are labeled by types from some set . For simplicity, we assume no constants, but it is easy to introduce them. For any sequence of variables , we denote the set of formulae whose free variables are a subset of . We define the set of judgements .


Given , a model of consists of

  • For every , a compact Polish space
  • For every where have types , an element of

It must satisfy the following:

  • Consider variables of types and variables of types . Consider also some s.t. . Given , we can form the substitution . We also have a mapping given by . We require
  • Consider variables and . Denote the projection mapping. We require
  • For any ,
Vanessa Kosoy's Shortform

When using infra-Bayesian logic to define a simplicity prior, it is natural to use "axiom circuits" rather than plain formulae. That is, when we write the axioms defining our hypothesis, we are allowed to introduce "shorthand" symbols for repeating terms. This doesn't affect the expressiveness, but it does affect the description length. Indeed, eliminating all the shorthand symbols can increase the length exponentially.

Vanessa Kosoy's Shortform

Instead of introducing all the "algebrator" logical symbols, we can define as the quotient by the equivalence relation defined by the algebraic laws. We then need only two extra logical atomic terms:

  • For any and (permutation), denote and require
  • For any and ,

However, if we do this then it's not clear whether deciding that an expression is a well-formed term can be done in polynomial time. Because, to check that the types match, we need to test the identity of algebraic expressions and opening all parentheses might result in something exponentially long.

Vanessa Kosoy's Shortform

Infra-Bayesianism can be naturally understood as semantics for a certain non-classical logic. This promises an elegant synthesis between deductive/symbolic reasoning and inductive/intuitive reasoning, with several possible applications. Specifically, here we will explain how this can work for higher-order logic. There might be holes and/or redundancies in the precise definitions given here, but I'm quite confident the overall idea is sound.

For simplicity, we will only work with crisp infradistributions, although a lot of this stuff can work for more general types of infradistributions as well. Therefore, will denote the space of crisp infradistribution. Given , will denote the corresponding convex set. As opposed to previously, we will include the empty-set, i.e. there is s.t. . Given and , will mean . Given , will mean .


Let denote a set which we interpret as the types of individuals (we allow more than one). We then recursively define the full set of types by:

  • (intended meaning: the uninhabited type)
  • (intended meaning: the one element type)
  • If then
  • If then (intended meaning: disjoint union)
  • If then (intended meaning: Cartesian product)
  • If then (intended meaning: predicates with argument of type )

For each , there is a set which we interpret as atomic terms of type . We will denote . Among those we distinguish the logical atomic terms:

  • Symbols we will not list explicitly, that correspond to the algebraic properties of and (commutativity, associativity, distributivity and the neutrality of and ). For example, given there is a "commutator" of type .
  • (intended meaning: predicate evaluation)
  • Assume that for each there is some : the set of "describable" infradistributions (for example, it can be empty, or consist of all distributions with rational coefficients, or all distributions, or all infradistributions; EDIT: it is probably sufficient to only have the fair coin distribution in in order for it to be possible to approximate all infradistributions on finite sets). If then

We recursively define the set of all terms . We denote .

  • If then
  • If and then
  • If and then
  • If then
  • If and then

Elements of are called formulae. Elements of are called sentences. A subset of is called a theory.


Given , a model of is the following data. To each , there must correspond some compact Polish space s.t.:

  • (the one point space)

To each , there must correspond a continuous mapping , under the following constraints:

  • , , , , , and the "algebrators" have to correspond to the obvious mappings.
  • . Here, is the diagonal and is the sharp infradistribution corresponding to the closed set .
  • Consider and denote . Then, . Here, we use the observation that the identity mapping can be regarded as an infrakernel from to .
  • is the convex hull of
  • Consider and denote , and the projection mapping. Then, .
  • Consider and denote , and the projection mapping. Then, iff .
  • . Notice that pullback of infradistributions is always defined thanks to adding (the empty set infradistribution).

Finally, for each , we require .

Semantic Consequence

Given , we say when . We say when for any model of , . It is now interesting to ask what is the computational complexity of deciding . [EDIT: My current best guess is co-RE]


As usual, let be a finite set of actions and be a finite set of observation. Require that for each there is which we interpret as the type of states producing observation . Denote (the type of all states). Moreover, require that our language has the nonlogical symbols (the initial state) and, for each , (the transition kernel). Then, every model defines a (pseudocausal) infra-POMDP. This way we can use symbolic expressions to define infra-Bayesian RL hypotheses. It is then tempting to study the control theoretic and learning theoretic properties of those hypotheses. Moreover, it is natural to introduce a prior which weights those hypotheses by length, analogical to the Solomonoff prior. This leads to some sort of bounded infra-Bayesian algorithmic information theory and bounded infra-Bayesian analogue of AIXI.

Vanessa Kosoy's Shortform

re: #5, that doesn't seem to claim that we can infer U given their actions, which is what the impossibility of deducing preferences is actually claiming.

You misunderstand the intent. We're talking about inverse reinforcement learning. The goal is not necessarily inferring the unknown , but producing some behavior that optimizes the unknown . Ofc if the policy you're observing is optimal then it's trivial to do so by following the same policy. But, using my approach we might be able to extend it into results like "the policy you're observing is optimal w.r.t. certain computational complexity, and your goal is to produce an optimal policy w.r.t. higher computational complexity."

(Btw I think the formal statement I gave for 5 is false, but there might be an alternative version that works.)

(And as pointed out elsewhere, it isn't Stuart's thesis, it's a well known and basic result in the decision theory / economics / philosophy literature.)

I am referring to this and related work by Armstrong.

Vanessa Kosoy's Shortform

Ah, okay, I see what you mean. Like how preferences are divisible into "selfish" and "worldly" components, where the selfish component is what's impacted by a future simulation of you that is about to have good things happen to it.

...I brought up the histories->states thing because I didn't understand what you were getting at, so I was concerned that something unrealistic was going on. For example, if you assume that the agent can remember its history, how can you possibly handle an environment with memory-wiping?

AMDP is only a toy model that distills the core difficulty into more or less the simplest non-trivial framework. The rewards are "selfish": there is a reward function which allows assigning utilities to histories by time discounted summation, and we consider the expected utility of a random robot sampled from a late population. And, there is no memory wiping. To describe memory wiping we indeed need to do the "unrolling" you suggested. (Notice that from the cybernetic model POV, the history is only the remembered history.)

For a more complete framework, we can use an ontology chain, but (i) instead of labels use labels, where is the set of possible memory states (a policy is then described by ), to allow for agents that don't fully trust their memory (ii) consider another chain with a bigger state space plus a mapping s.t. the transition kernels are compatible. Here, the semantics of is: the multiset of ontological states resulting from interpreting the physical state by taking the viewpoints of different agents contains.

In fact, to me the example is still somewhat murky, because you're talking about the subjective probability of a state given a policy and a timestep, but if the agents know their histories there is no actual agent in the information-state that corresponds to having those probabilities.

I didn't understand "no actual agent in the information-state that corresponds to having those probabilities". What does it mean to have an agent in the information-state?

Vanessa Kosoy's Shortform

I'm not quite sure what are you trying to say here, probably my explanation of the framework was lacking. The robots already remember the history, like in classical RL. The question about the histories is perfectly well-defined. In other words, we are already implicitly doing what you described. It's like in classical RL theory, when you're proving a regret bound or whatever, your probability space consists of histories.

I'm still confused about what you mean by "Bayesian hypothesis" though. Do you mean a hypothesis that takes the form of a non-anthropic MDP?

Yes, or a classical RL environment. Ofc if we allow infinite state spaces, then any environment can be regarded as an MDP (whose states are histories). That is, I'm talking about hypotheses which conform to the classical "cybernetic agent model". If you wish, we can call it "Bayesian cybernetic hypothesis".

Also, I want to clarify something I was myself confused about in the previous comment. For an anthropic Markov chain (when there is only one action) with a finite number of states, we can give a Bayesian cybernetic description, but for a general anthropic MDP we cannot even if the number of states is finite.

Indeed, consider some . We can take its expected value to get . Assuming the chain is communicating, is an irreducible non-negative matrix, so by the Perron-Frobenius theorem it has a unique-up-to-scalar maximal eigenvector . We then get the subjective transition kernel:

Now, consider the following example of an AMDP. There are three actions and two states . When we apply to an robot, it creates two robots, whereas when we apply to an robot, it leaves one robot. When we apply to an robot, it creates two robots, whereas when we apply to an robot, it leaves one robot. When we apply to any robot, it results in one robot whose state is with probability and with probability .

Consider the following two policies. takes the sequence of actions and takes the sequence of actions . A population that follows would experience the subjective probability , whereas a population that follows would experience the subjective probability . Hence, subjective probabilities depend on future actions. So, effectively anthropics produces an acausal (Newcomb-like) environment. And, we already know such environments are learnable by infra-Bayesian RL agents and, (most probably) not learnable by Bayesian RL agents.

Vanessa Kosoy's Shortform

I'm not sure what do you mean by that "unrolling". Can you write a mathematical definition?

Let's consider a simple example. There are two states: and . There is just one action so we can ignore it. is the initial state. An robot transition into an robot. An robot transitions into an robot and an robot. How will our population look like?

0th step: all robots remember

1st step: all robots remember

2nd step: 1/2 of robots remember and 1/2 of robots remember

3rd step: 1/3 of robots remembers , 1/3 of robots remember and 1/3 of robots remember

There is no Bayesian hypothesis a robot can have that gives correct predictions both for step 2 and step 3. Indeed, to be consistent with step 2 we must have and . But, to be consistent with step 3 we must have , .

In other words, there is no Bayesian hypothesis s.t. we can guarantee that a randomly sampled robot on a sufficiently late time step will have learned this hypothesis with high probability. The apparent transition probabilities keep shifting s.t. it might always continue to seem that the world is complicated enough to prevent our robot from having learned it already.

Or, at least it's not obvious there is such a hypothesis. In this example, will converge to the golden ratio at late steps. But, do all probabilities converge fast enough for learning to happen, in general? I don't know, maybe for finite state spaces it can work. Would definitely be interesting to check.

[EDIT: actually, in this example there is such a hypothesis but in general there isn't, see below]

Operationalizing compatibility with strategy-stealing

Notice that Eliezer's definition + the universal prior is essentially the same as the computationally unbounded variant of my definition of goal-directed intelligence, if you fix the utility function and prior. I think that when you're comparing different utility functions in your definition of strategy-stealing, it's better to emulate my definition and correct by the description complexity of the utility function. Indeed, if you have an algorithm that works equally well for all utility functions but contains a specification of the utility function inside its source code, then you get "spurious" variance if you don't correct for it (usually superior algorithms would also have to contain a specification of the utility function, so the more complex the utility function the less of them will be).

Load More