I haven't seen much discussion of this, but it seems like an important factor in how well AI systems deployed by actors with different goals manage to avoid conflict (cf. my discussion of equilibrium and prior selection problems here).
For instance, would systems be trained
- Against copies of agents developed by other labs (possibly with measures to mask private information)?
- Simultaneously with other agents in a simulator that each developer has access to?
- Against copies of themselves?
- Against distributions of counterpart policies engineered to have certain properties? What would those properties be?
Makes sense. Though you could have deliberate coordinated training even after deployment. For instance, I'm particularly interested in the question of "how will agents learn to interact in high stakes circumstances which they will rarely encounter?" One could imagine the overseers of AI systems coordinating to fine-tune their systems in simulations of such encounters even after deployment. Not sure how plausible that is though.