Problems in AI Alignment that philosophers could potentially contribute to

by Wei Dai1 min read17th Aug 20197 comments


AI RiskDecision TheoryEthics & MoralityForecasting & Prediction

(This was originally a comment that I wrote as a follow up to my question for William MacAskill's AMA. I'm moving it since it's perhaps more on-topic here.)

It occurs to me that another reason for the lack of engagement by people with philosophy backgrounds may be that philosophers aren't aware of the many philosophical problems in AI alignment that they could potentially contribute to. So here's a list of philosophical problems that have come up just in my own thinking about AI alignment.

  • Decision theory for AI / AI designers
    • How to resolve standard debates in decision theory?
    • Logical counterfactuals
    • Open source game theory
    • Acausal game theory / reasoning about distant superintelligences
  • Infinite/multiversal/astronomical ethics
    • Should we (or our AI) care much more about a universe that is capable of doing a lot more computations?
    • What kinds of (e.g. spatial-temporal) discounting is necessary and/or desirable?
  • Fair distribution of benefits
    • How should benefits from AGI be distributed?
    • For example, would it be fair to distribute it equally over all humans who currently exist, or according to how much AI services they can afford to buy?
    • What about people who existed or will exist at other times and in other places or universes?
  • Need for "metaphilosophical paternalism"?
    • However we distribute the benefits, if we let the beneficiaries decide what to do with their windfall using their own philosophical faculties, is that likely to lead to a good outcome?
  • Metaphilosophy
    • What is the nature of philosophy?
    • What constitutes correct philosophical reasoning?
    • How to specify this into an AI design?
  • Philosophical forecasting
    • How are various AI technologies and AI safety proposals likely to affect future philosophical progress (relative to other kinds of progress)?
  • Preference aggregation between AIs and between users
    • How should two AIs that want to merge with each other aggregate their preferences?
    • How should an AI aggregate preferences between its users?
  • Normativity for AI / AI designers
    • What is the nature of normativity? Do we need to make sure an AGI has a sufficient understanding of this?
  • Metaethical policing
    • What are the implicit metaethical assumptions in a given AI alignment proposal (in case the authors didn't spell them out)?
    • What are the implications of an AI design or alignment proposal under different metaethical assumptions?
    • Encouraging designs that make minimal metaethical assumptions or is likely to lead to good outcomes regardless of which metaethical theory turns out to be true.
    • (Nowadays AI alignment researchers seem to be generally good about not placing too much confidence in their own moral theories, but the same can't always be said to be true with regard to their metaethical ideas.)