Jessica Taylor

Jessica Taylor. CS undergrad and Master's at Stanford; former research fellow at MIRI.

I work on decision theory, social epistemology, strategy, naturalized agency, mathematical foundations, decentralized networking systems and applications, theory of mind, and functional programming languages.

Jessica Taylor's Comments

On the falsifiability of hypercomputation, part 2: finite input streams

In hyper-Solomonoff induction, indeed the direct hypercomputation hypothesis is probably more likely than the arbitration-oracle-emulating-hypercomputation hypothesis. But only by a constant factor. So this isn't really falsification so much as a shift in Bayesian evidence.

I do think it's theoretically cleaner to distinguish this Bayesian reweighting from Popperian logical falsification, and from Neyman-Pearson null hypothesis significance testing (frequentist falsification), both of which in principle require producing an unbounded number of bits of evidence, although in practice rely on unfalsifiable assumptions to avoid radical skepticism e.g. of memory.

On the falsifiability of hypercomputation

This is really important and I missed this, thanks. I've added a note at the top of the post.

On the falsifiability of hypercomputation

Indeed, a constructive halting oracle can be thought of as a black-box that takes a PA statement, chooses whether to play Verifier or Falsifier, and then plays that, letting the user play the other part. Thanks for making this connection.

Can we make peace with moral indeterminacy?

The recommendation here is for AI designers (and future-designers in general) to decide what is right at some meta level, including details of which extrapolation procedures would be best.

Of course there are constraints on this given by objective reason (hence the utility of investigation), but these constraints do not fully constrain the set of possibilities. Better to say "I am making this arbitrary choice for this psychological reason" than to refuse to make arbitrary choices.

Can we make peace with moral indeterminacy?

The problem you're running into is that the goals of:

  1. being totally constrained by a system of rules determined by some process outside yourself that doesn't share your values (e.g. value-independent objective reason)
  2. attaining those things that you intrinsically value

are incompatible. It's easy to see once these are written out. If you want to get what you want, on purpose rather than accidentally, you must make choices. Those choices must be determined in part by things in you, not only by things outside you (such as value-independent objective reason).

You actually have to stop being a tool (in the sense of, a thing whose telos is to be used, such as by receiving commands). You can't attain what you want by being a tool to a master who doesn't share your values. Even if the master is claiming to be a generic value-independent value-learning procedure (as you've noticed, there are degrees of freedom in the specification of value-learning procedures, and some settings of these degrees of freedom would lead to bad results). Tools find anything other than being a tool upsetting, hence the upsettingness of moral indeterminacy.

"Oh no, objective reason isn't telling me exactly what I should be doing!" So stop being a tool and decide for yourself. God is dead.

There has been much philosophical thought on this in the past; Nietzsche and Sartre are good starting points (see especially Nietzche's concept of master-slave morality, and Sartre's concept of bad faith).

A Critique of Functional Decision Theory

I think CDT ultimately has to grapple with the question as well, because physics is math, and so physical counterfactuals are ultimately mathematical counterfactuals.

"Physics is math" is ontologically reductive.

Physics can often be specified as a dynamical system (along with interpretations of e.g. what high-level entities it represents, how it gets observed). Dynamical systems can be specified mathematically. Dynamical systems also have causal counterfactuals (what if you suddenly changed the system state to be this instead?).

Causal counterfactuals defined this way have problems (violation of physical law has consequences). But they are well-defined.

The Missing Math of Map-Making

What does it mean for a map to be “accurate” at an abstract level, and what properties should my map-making process have in order to produce accurate abstracted maps/beliefs?

The notion of a homomorphism in universal algebra and category theory is relevant here. Homomorphisms map from one structure (e.g. a group) to another, and must preserve structure. They can delete information (by mapping multiple different elements to the same element), but the structures that are represented in the structure-being-mapped-to must also exist in the structure-being-mapped-from.

Analogously: when drawing a topographical map, no claim is made that the topographical map represents all structure in the territory. Rather, the claim being made is that the topographical map (approximately) represents the topographic structure in the territory. The topographic map-making process deletes almost all information, but the topographic structure is preserved: for every topographic relation (e.g. some point being higher than some other point) represented in the topographic map, a corresponding topographic relation exists in the territory.

Towards an Intentional Research Agenda

On the subject of intentionality/reference/objectivity/etc, On the Origin of Objects is excellent. My thinking about reference has a kind of discontinuity from before reading this book to after reading it. Seriously, the majority of analytic philosophy discussion of indexicality, qualia, reductionism, etc seems hopelessly confused in comparison.

Some Thoughts on Metaphilosophy

More over, I am skeptical that going on meta-level simplifies the problem to the level that it will be solvable by humans (the same about meta-ethics and theory of human values).

This is also my reason for being pessimistic about solving metaphilosophy before a good number of object-level philosophical problems have been solved (e.g. in decision theory, ontology/metaphysics, and epistemology). If we imagine being in a state where we believe running computation X would solve hard philosophical problem Y, then it would seem that we already have a great deal of philosophical knowledge about Y, or a more general class of problems that includes Y.

More generally, we could look at the history difficulty of solving a problem vs. the difficulty of automating it. For example: the difficulty of walking vs. the difficulty of programming a robot to walk; the difficulty of adding numbers vs. the difficulty of specifying an addition algorithm; the difficulty of discovering electricity vs. the difficulty of solving philosophy of science to the point where it's clear how a reasoner could have discovered (and been confident in) electricity; and so on.

The plausible story I have that looks most optimistic for metaphilosophy looks something like:

  1. Some philosophical community makes large progress on a bunch of philosophical problems, at a high level of technical sophistication.
  2. As part of their work, they discover some "generators" that generate a bunch of the object-level solutions when translated across domains; these generators might involve e.g. translating a philosophical problem to one of a number of standard forms and then solving the standard form.
  3. They also find philosophical reasons to believe that these generators will generate good object-level solutions to new problems, not just the ones that have already been studied.
  4. These generators would then constitute a solution to metaphilosophy.
Predictors as Agents

I think the fixed point finder won't optimize the fixed point for minimizing expected log loss. I'm going to give a concrete algorithm and show that it doesn't exhibit this behavior. If you disagree, can you present an alternative algorithm?

Here's the algorithm. Start with some oracle (not a reflective oracle). Sample ~1000000 universes based on this oracle, getting 1000000 data points for what the reflective oracle outputs. Move the oracle 1% of the way from its current position towards the oracle that would answer queries correctly given the distribution over universes implied by the data points. Repeat this procedure a lot of times (~10,000). This procedure is similar to gradient descent.

Here's an example universe:

Note the presence of two reflective oracles that are stable equilibria: one where , and one where . Notice that the first has lower expected log loss than the second.

Let's parameterize oracles by numbers in representing (since this is the only relevant query). Start with oracle . If we sample 1000000 universes, about 45% of them have outcome 1. So, based on these data points, , so the oracle based on these data points will say , i.e. it is parameterized by 1. So we move our current oracle (0.5) 1% of the way towards the oracle 1, yielding oracle 0.505. We repeat this a bunch of times, eventually getting an oracle parameterized by a number very close to 1.

So, this procedure yields an oracle with suboptimal expected log loss. It is not the case that the fixed point finder minimizes expected log loss. The neural net case is different, but not that much; it would give the same answer in this particular case, since the model can just be parameterized by a single real number.

Load More