Research Lead at CORAL. Director of AI research at ALTER. PhD student in Shay Moran's group in the Technion (my PhD research and my CORAL/ALTER research are one and the same). See also Google Scholar and LinkedIn.
E-mail: {first name}@alter.org.il
No, it's not at all the same thing as OpenAI is doing.
First, OpenAI is working using a methodology that's completely inadequate for solving the alignment problem. I'm talking about racing to actually solve the alignment problem, not racing to any sort of superintelligence that our wishful thinking says might be okay.
Second, when I say "racing" I mean "trying to get there as fast as possible", not "trying to get there before other people". My race is cooperative, their race is adversarial.
Third, I actually signed the FLI statement on superintelligence. OpenAI hasn't.
Obviously any parallel efforts might end up competing for resources. There are real trade-offs between investing more in governance vs. investing more in technical research. We still need to invest in both, because of diminishing marginal returns. Moreover, consider this: even the approximately-best-case scenario of governance only buys us time, it doesn't shut down AI forever. The ultimate solution has to come from technical research.
Strong disagree.
We absolutely do need to "race to build a Friendly AI before someone builds an unFriendly AI". Yes, we should also try to ban Unfriendly AI, but there is no contradiction between the two. Plans are allowed (and even encouraged) to involve multiple parallel efforts and disjunctive paths to success.
It's not that academic philosophers are exceptionally bad at their jobs. It's that academic philosophy historically did not have the right tools to solve the problems. Theoretical computer science, and AI theory in particular, is a revolutionary method to reframe philosophical problems in a way that finally makes them tractable.
About "metaethics" vs "decision theory", that strikes me as a wrong way of decomposing the problem. We need to create a theory of agents. Such a theory naturally speaks both about values and decision making, and it's not really possible to cleanly separate the two. It's not very meaningful to talk about "values" without looking at what function the values do inside the mind of an agent. It's not very meaningful to talk about "decisions" without looking at the purpose of decisions. It's also not very meaningful to talk about either without also looking at concepts such as beliefs and learning.
As to "gung-ho attitude", we need to be careful both of the Scylla and the Charybdis. The Scylla is not treating the problems with the respect they deserve, for example not noticing when a thought experiment (e.g. Newcomb's problem or Christiano's malign prior) is genuinely puzzling and accepting any excuse to ignore it. The Charybdis is perpetual hyperskepticism / analysis-paralysis, never making any real progress because any useful idea, at the point of its conception, is always half-baked and half-intuitive and doesn't immediately come with unassailable foundations and justifications from every possible angle. To succeed, we need to chart a path between the two.
I found LLMs to be very useful for literature research. They can find relevant prior work that you can't find with a search engine because you don't know the right keywords. This can be a significant force multiplier.
They also seem potentially useful for quickly producing code for numerical tests of conjectures, but I only started experimenting with that.
Other use cases where I found LLMs beneficial:
That said, I do agree that early adopters seem like they're overeager and maybe even harming themselves in some way.
Btw, what are some ways we can incorporate heuristics into our algorithm while staying on level 1-2?
I see modeling vs. implementation as a spectrum more than a dichotomy. Something like:
More precisely, rather than a 1-dimensional spectrum, there are at least two parameters involved:
[EDIT: And a 3rd parameter is, how justified/testable the assumptions of your model is. Ideally, you want these assumptions to be grounded in science. Some will likely be philosophical assumptions which cannot be tested empirically, but at least they should fit into a coherent holistic philosophical view. At the very least, you want to make sure you're not assuming away the core parts of the problem.]
For the purposes of safety, you want to be as close to the implementation end of the spectrum as you can get. However, the model side of the spectrum is still useful as:
Sorry, I was wrong. By Lob's theorem, all versions of are provably equivalent, so they will trust each other.
IIUC, fixed point equations like that typically have infinitely many solution. So, you defined not one predicate, but an infinite family of them. Therefore, your agent will trust a copy of itself, but usually won't trust variants of itself with other choices of fixed point. In this sense, this proposal is similar to proposals based on quining (as quining has many fixed points as well).
TLDR: A new proposal for a prescriptive theory of multi-agent rationality, based on generalizing superrationality using local symmetries.
Hofstadter's "superrationality" postulates that a rational agent should behave as if other rational agents are similar to itself. In a symmetric game, this implies the selection of a Pareto efficient outcome. However, it's not clear how to apply this principle to asymmetric games. I propose to solve this by suggesting a notion of "local" symmetry that is much more ubiquitous than global (ordinary) symmetry.
We will focus here entirely on finite deterministic (pure strategy) games. Generalizing this to mixed strategies and infinite games more generally is an important question left for the future.
Consider a game with a set of players, for each player a non-empty set or pure strategies and a utility function , where . What does it mean for the game to be "symmetric"? We propose a "soft" notion of symmetry that is only sensitive to the ordinal ordering between payoffs. While the cardinal value of the payoff will be important in the stochastic context, we plan to treat the latter by explicitly expanding the strategy space.
Given two games , , a premorphism from to is defined to consist of
Notice that these mappings induce a mapping via
A premorphism is said to be a homomorpism if and are s.t. that for any and , if then . Homomorphisms makes games into a category in a natural way.
An automorphism of is a homomorphism from to that has an inverse homomorphism. An automorphism of is said to be flat when for any , if then is the identity mapping[1]. A flat symmetry of is a group together with a group homomorphism s.t. is flat for all .
Given a flat symmetry , we can apply the superrationality principle to reduce to the "smaller" quotient game . The players are i.e. orbits in the original player set. Given , we define the set of strategies for player in the game to be
This is non-empty thanks to flatness.
Observe that there is a natural mapping given by
Finally, the utility function of is defined by
It is easy to see that the construction of is functorial w.r.t. the category structure on games that we defined.
We can generalize this further by replacing game homomorphisms with game quasimorphisms: premorphisms satisfying the following relaxed condition on and :
This is no longer closed under composition, so no longer defines a category structure[2]. However, we can still define flat quasisymmetries and the associated quotient games, and this construction is still functorial w.r.t. the original (not "quasi") category structure. Moreover, there is a canonical homomorphism (not just a quasimorphism) from to , even when is a mere quasisymmetry.
The significance of quasisymmetries can be understood as follows.
The set of all games on a fixed set of players with fixed strategy sets naturally forms a topological space[3] . Given a group acting on the game via invertible premorphisms, the subset of where is a symmetry is not closed, in general. However, the subset of where is a quasisymmetry is closed. I believe this will be important when generalizing the formalism to infinite games.
What if a game doesn't have even quasisymmetries? Then, we can look for a coarse graining of the game which does have them.
Consider a game . For each , let be some set and a surjection. Denote . Given any , we have the game in which:
If for any we choose some this allows us to define the coarse-grained game in which:
Essentially, we rephrased as an extensive form game in which first the game is played and then, if the outcome was , the game is played. This, assuming the expected outcome of game is .
It is also possible to generalize this construction by allowing multivalued .
Given a game , a local symmetry presolution (LSP) is recursively defined to be one of the following:
It's easy to see that an LSP always exists, because for any game we can choose a sequence of coarse-grainings whose effect is making the players choose their strategies one by one (with full knowledge of choices by previous players). However, not all LSP are "born equal". It seems appealing to prefer LSP which use "more" symmetry. This can be formalized as follows.
The way an LSP is constructed defines the set of coherent outcomes , which is the set of outcomes compatible with the local symmetries[4]. We define it recursively as follows:
We define a local symmetry solution (LSS) to be an LSP whose set of coherent outcomes is minimal w.r.t. set inclusion.
Note that flatness is in general not preserved under composition.
I believe this can be interpreted as a category enriched over the category of sets and partial functions, with the monoidal structure given by Cartesian product of sets.
An affine space, even.
Maybe we can interpret the set of coherent outcomes as a sharp infradistirbution corresponding to the players' joint belief about the outcome of the game.
This frame seems useful, but might obscure some nuance: