Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran's group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.
E-mail: {first name}@alter.org.il
The interpretation of quantum mechanics is a philosophical puzzle that was baffling physicists and philosophers for about a century. In my view, this confusion is a symptom of us lacking a rigorous theory of epistemology and metaphysics. At the same time, creating such a theory seems to me like a necessary prerequisite for solving the technical AI alignment problem. Therefore, once we created a candidate theory of metaphysics (Formal Computation Realism (FCR), formerly known as infra-Bayesian Physicalism), the interpretation of quantum mechanics stood out as a powerful test case. In the work presented in this post, we demonstrated that FCR indeed passes this test (at least to a first approximation).
What is so confusing about quantum mechanics? To understand this, let's take a look at a few of the most popular pre-existing interpretations.
The Copenhagen Interpretation (CI) proposes a mathematical rule for computing the probabilities of observation sequences, via postulating the collapse of the wavefunction. For every observation, you can apply the Born Rule to compute the probabilities of different results, and once a result is selected, the wavefunction is "collapsed" by projecting it to the corresponding eigenspace.
CI seems satisfactory to a logical positivist: if all we need from a physical theory is computing the probabilities of observations, we have it. However, this is unsatisfactory for a decision-making agent if the agent's utility function depends on something other than its direct observations. For such an agent, CI offers no well-defined way to compute expected utility. Moreover, while normally decoherence ensures that the observations of all agents are in some sense "consistent", in principle it is theoretically possible to create a situation in which decoherence fails and CI will prescribe contradictory beliefs to different agents (as in the Wigner's friend thought experiment).
In CI, the wavefunction is merely a book-keeping device with no deep meaning of its own. In contrast, the Many Worlds Interpretation (MWI) takes a realist metaphysical stance, postulating that the wavefunction describes the objective physical state of the universe. This, in principle, admits meaningful unobservable quantities on which the values of agents can depend. However, the MWI has no mathematical rule for computing probabilities of observation sequences. If all "worlds" exist at the same time, there's no obvious reason to expect to see one of them rather than another. MWI proponents address this by handwaving into existence some "degree of reality" that some worlds posses more than others. However, the fundamental fact remains that there is no well-defined prescription for probabilities of observation sequences, unless we copy the prescription of CI: however the latter is inconsistent with the intent of MWI in cases when decoherence fails, such as Wigner's friend.
In principle, we can defend an MWI-based decision-theory in which the utility function is a self-adjoint operator on the Hilbert space and we are maximizing its expectation in the usual quantum-mechanical sense. Such a decision-theory can avoid the need for a well-defined probability distribution over observation sequences. However, it would leave us with an "ontological crisis": if our agent did not start out knowing quantum mechanics, how would it translate its values into this quantum mechanical form?[1]
The De Broglie-Bohm Interpretation (DBBI) proposes that in addition to the wavefunction, we should also postulate a classical trajectory following a time-evolution law that depends on the wavefunction. This results in a realist theory with a well-defined distribution over observation sequences. However, it comes with two major issues:
In my view, the real source of all the confusion is the lack of rigorous metaphysics: we didn't know (prior to this line of research), in full generality, what the type signature of a physical theory should be and how should we evaluate such a theory.
Enter Formal Computational Realism (FCR). According to FCR, the fundamental ontology in which all beliefs and values about the world should be expressed is computable logical facts plus the computational information content of the universe. The universe can contain information about computations (e.g., if someone calculated the 700th digit of pi, then the universe contains this information), and fundamentally this information is all there is[2]. Moreover, given an algorithmic description of a physical theory plus the epistemic state of an agent in relation to computable logical facts, it is possible to formally specify the computational information content that this physical theory implies, from the perspective of the agent. The latter operation is called the "bridge transform".
To apply this to quantum mechanics, we need to choose a particular algorithmic description. The choice we settled on is fairly natural: We imagine all possible quantum observables as having marginal distributions that obey the Born rule, with the joint distribution being otherwise completely ambiguous (in the sense that imprecise probability allows distributions to be ambiguous, i.e. we have "Knightian uncertainty" about it). The latter is a natural choice, because quantum mechanics has no prescription for the joint distribution of noncommuting observables. The combined values of all observables is the "state" that the physical theory computes, and the agent's policy is treated as an unknown logical fact on which the computation depends.
Applying the bridge transform to the above operationalization of quantum mechanics, we infer the computational information content of the universe according to quantum mechanics, and then use the latter to extract the probabilities of various agent experiences. What we discover is as follows:
As opposed to most pre-existing interpretations, the resulting formalism has precisely defined decision-theoretic prescriptions for an agent in any "weird" (i.e. not-decohering) situation like e.g. Wigner's friend. This only requires the agent's values to be specified in the FCR ontology (and in particular allows the agent to assign value to their own experiences, in some arbitrary history-dependent way, and/or the experiences of particular other agents).
In conclusion, FCR passed a non-trivial test here. It was not obvious to me that it would: before Gergely figured out the details, I wasn't sure that it's going to work at all. As such, I believe this to be a milestone result. (With some caveats: e.g. it needs to be rechecked for the non-monotonic version of the framework.)
Note that de Blanc's proposal is inapplicable here, since the quantum ontology is not a Markov decision process.
To be clear, this is just a vague informal description, FCR is an actual rigorous mathematical framework.
I'm renaming Infra-Bayesian Physicalism to Formal Computational Realism (FCR), since the latter name is much more in line with the nomenclature in academic philosophy.
AFAICT, the closest pre-existing philosophical views are Ontic Structural Realism (see 1 2) and Floridi's Information Realism. In fact, FCR can be viewed as a rejection of physicalism, since it posits that a physical theory is meaningless unless it's conjoined with beliefs about computable mathematics.
The adjective "formal" is meant to indicate that it's a formal mathematical framework, not just a philosophical position. The previously used adjective "infra-Bayesian" now seems to me potentially confusing: On the one hand, it's true that the framework requires imprecise probability (hence "infra"), on the other hand it's a hybrid of frequentist and Bayesian.
To keep terminology consistent, Physicalist Superimitation should now be called Computational Superimitation (COSI).
This frame seems useful, but might obscure some nuance:
No, it's not at all the same thing as OpenAI is doing.
First, OpenAI is working using a methodology that's completely inadequate for solving the alignment problem. I'm talking about racing to actually solve the alignment problem, not racing to any sort of superintelligence that our wishful thinking says might be okay.
Second, when I say "racing" I mean "trying to get there as fast as possible", not "trying to get there before other people". My race is cooperative, their race is adversarial.
Third, I actually signed the FLI statement on superintelligence. OpenAI hasn't.
Obviously any parallel efforts might end up competing for resources. There are real trade-offs between investing more in governance vs. investing more in technical research. We still need to invest in both, because of diminishing marginal returns. Moreover, consider this: even the approximately-best-case scenario of governance only buys us time, it doesn't shut down AI forever. The ultimate solution has to come from technical research.
Strong disagree.
We absolutely do need to "race to build a Friendly AI before someone builds an unFriendly AI". Yes, we should also try to ban Unfriendly AI, but there is no contradiction between the two. Plans are allowed (and even encouraged) to involve multiple parallel efforts and disjunctive paths to success.
It's not that academic philosophers are exceptionally bad at their jobs. It's that academic philosophy historically did not have the right tools to solve the problems. Theoretical computer science, and AI theory in particular, is a revolutionary method to reframe philosophical problems in a way that finally makes them tractable.
About "metaethics" vs "decision theory", that strikes me as a wrong way of decomposing the problem. We need to create a theory of agents. Such a theory naturally speaks both about values and decision making, and it's not really possible to cleanly separate the two. It's not very meaningful to talk about "values" without looking at what function the values do inside the mind of an agent. It's not very meaningful to talk about "decisions" without looking at the purpose of decisions. It's also not very meaningful to talk about either without also looking at concepts such as beliefs and learning.
As to "gung-ho attitude", we need to be careful both of the Scylla and the Charybdis. The Scylla is not treating the problems with the respect they deserve, for example not noticing when a thought experiment (e.g. Newcomb's problem or Christiano's malign prior) is genuinely puzzling and accepting any excuse to ignore it. The Charybdis is perpetual hyperskepticism / analysis-paralysis, never making any real progress because any useful idea, at the point of its conception, is always half-baked and half-intuitive and doesn't immediately come with unassailable foundations and justifications from every possible angle. To succeed, we need to chart a path between the two.
I found LLMs to be very useful for literature research. They can find relevant prior work that you can't find with a search engine because you don't know the right keywords. This can be a significant force multiplier.
They also seem potentially useful for quickly producing code for numerical tests of conjectures, but I only started experimenting with that.
Other use cases where I found LLMs beneficial:
That said, I do agree that early adopters seem like they're overeager and maybe even harming themselves in some way.
Btw, what are some ways we can incorporate heuristics into our algorithm while staying on level 1-2?
I see modeling vs. implementation as a spectrum more than a dichotomy. Something like:
More precisely, rather than a 1-dimensional spectrum, there are at least two parameters involved:
[EDIT: And a 3rd parameter is, how justified/testable the assumptions of your model is. Ideally, you want these assumptions to be grounded in science. Some will likely be philosophical assumptions which cannot be tested empirically, but at least they should fit into a coherent holistic philosophical view. At the very least, you want to make sure you're not assuming away the core parts of the problem.]
For the purposes of safety, you want to be as close to the implementation end of the spectrum as you can get. However, the model side of the spectrum is still useful as:
Sorry, I was wrong. By Lob's theorem, all versions of are provably equivalent, so they will trust each other.
This post discusses an important point: it is impossible to be simultaneously perfectly priorist ("updateless") and learn. Learning requires eventually "passing to" something like a posterior, which is inconsistent with forever maintaining "entanglement" with a counterfactual world. This is somewhat similar to the problem of traps (irreversible transitions): being prudent about risking traps requires relying on your prior, which prevents you from learning every conceivable opportunity.
My own position on this cluster of questions is that you should be priorist/(infra-)Bayesian about physics but postist/learner/frequentist about logic. This idea is formally embodied in the no-regret criterion for Formal Computational Realism. I believe that this no-regret condition implies something like the OP's "Eventual Learning", but formally demonstrating it is future work.