*The idea is an elaboration of a comment I made previously.*

**Motivation**: Improving our understanding of superrationality.

**Topic**: Investigate the following conjecture.

Consider two agents playing iterated prisoner's dilemma (IPD) with geometric time discount. It is well known that, for sufficiently large discount parameters (11−γ≫0), essentially all outcomes of the normal form game become Nash equilibria (the folk theorem). In particular, cooperation can be achieved via the tit-for-tat strategy. However, defection is still a Nash equilibrium (and even a subgame perfect equilibrium).

Fix n1,n2∈N. Consider the following IPD variant: the first player is forced to play a strategy that can be represented by a finite state automaton of n1 states, and the second player is forced to play a strategy that can be represented by a finite state automaton of n2 states. For our purpose a "finite state automaton" consists of a set of states S, the transition mapping τ:S×{C,D}→S and the "action mapping" α:S→{C,D}. Here, τ tells you how to update your state after observing the opponent's last action, and α tells you which action to take. Denote the resulting (normal form) game FIPD(n1,n2,γ), where γ is the time discount parameter.

*Conjecture:* If n1,n2≥2 then there are a functions T:(0,1)→(0,∞) and δ:(0,1)→(0,∞) s.t. the following conditions hold:

- limγ→1T(γ)=0
- limγ→1δ(γ)=0
- Any thermodynamic equilibrium of FIPD(n1,n2,γ) of temperature T(γ) has the payoffs of CC up to δ(γ).

**Strategies:** You could take two approaches: theoretical research and experimental research.

For theoretical research, you would try to prove or disprove the conjecture. If the initial conjecture is too hard, you can try to find easier variants (such as n1=n2=2, or adding more constraints on the automaton). If you succeed proving the conjecture, you can go on to studying games other than prisoner's dilemma (for example, do we always converge to Pareto efficiency?) If you succeed in disproving the conjecture, you can go on to look for variants that survive (for example, assume n1=n2 or that the finite state automatons must not have irreversible transitions).

To decompose the task I propose: (i) have each person in the team think of ideas how to approach this (ii) brainstorm everyone's ideas together and select a subset of promising ideas (iii) distribute the promising ideas among people and/or take each promising idea and find multiple lemmas that different people can try proving.

Don't forget to check whether the literature has adjacent results. This also helps decomposing: the literature survey can be assigned to a subset of the team, and/or different people can search for different keywords / read different papers.

For experimental research, you would code an algorithm that computes the thermodynamic equilibria, and see how the payoffs behave as a function of T and γ. Optimally, you would also provide a derivation of the error bounds on your results. To decompose the task, use the same strategy as in the theoretical case to come up with the algorithms and the code design. Afterwards, decompose it by having each person implement a segment of the code (pair programming is also an option).

It is also possible to go for theoretical *and* experimental simultaneously, by distributing among people and cross-fertilizing along the way.

Microscope AI in general seems like a very decomposition-friendly area. Take a trained neural net, assign each person a chunk to focus on, and everybody tries to figure out what features/algorithms/other neat stuff are embedded in their chunk.

Also should work well with a regular-meetup-group format, since the arrangement would be fairly robust to people missing a meeting, joining up midway, having completely different approaches or backgrounds, etc. Relatively open-ended, room for people to try different approaches based on what interests them and cross-pollinate strategies with the group.

What sort of math background can we assume the group to have?

I don't know (partially because I'm unsure who would stay and leave).

If you didn't take math background that in consideration and wrote a proposal (saying "requires background in real analysis" or ...), then that may push out people w/o that background but also attract people with that background.

As long as pre-reqs are explicit, you should go for it.

Are you looking for an open problem which is sub-dividable into many smaller open problems? Or for one small open problem which is a part of a larger open problem?

The first one. As long as you can decompose the open problem into tractable, bite-sized pieces, it's good.

Vanessa mentioned some strategies that might generalize to other open problems: group decomposition (we decide how to break a problem up), programming to empirically verify X, and literature reviews.

I'm unclear on how to apply this filter. Can you give an example of what you mean by decomposable, and an example of not? (Perhaps not from alignment.)

If you only had access to people who can google, program, and notice confusion, how could you utilize that to make conceptual progress on a topic you care about?

Decomposable: Make a simple first person shooter. Could be decomposed into creating asset models, and various parts of the actual code can be decomposed (input-mapping, getting/dealing damage).

Non-decomposable: Help me write an awesome piano song. Although this can be decomposed, I don't expect anyone to have the skills required (and acquiring the skills requires too much overhead).

Let's operationalize "too much overhead" to mean "takes more than 10 hours to do useful, meaningful tasks".

Am I correct that the real generating rule here is something like "I have a group of people who'd like to work on some alignment open problems, and want a problem that is a) easy to give my group, and b) easy to subdivide once given to my group?"

b) seems right. I'm unsure what (a) could mean (not much overhead?).

I feel confused to think about decomposability w/o considering the capabilities of the people I'm handing the tasks off to. I would only add:

since that makes the capabilities explicit.