I think that evolution selects for functional decision theory (FDT). More specifically, it selects the best policy over a life time, and not the best action in a given situation. I don't mean that we actually cognitively calculate FDT, but that there is an evolutionary pressure to act as if we follow FDT

Example: Revenge
By revenge I mean burning some of your utility just to get back at someone who hurt you.

Revenge is equivalent to transparent Newcomb's problem. You can see that Omega has predicted that you will two-box, i.e. the box that could have lots of money is empty. What do you do? You can one-box anyway, and counterfactually make this situation less likely, but giving up the smaller reward too (take revenge) or you can accept that you can't change the past, cut your losses, and just take the smaller reward (no revenge).

The way this is evolutionally encoded in humans is not a tendency to think about counterfactual situations. Instead we get mad at Omega for withholding the larger reward, and we one-box out of spite, to make Omegas prediction wrong, forcing a loose-loose outcome. But it is still FDT in practice.

Taking revenge can also be justified causally, if it it's about upholding your reputation so no one crosses you again. Humans defiantly do this calculation too. But it seems like most humans have a revenge drive that is stronger than what CDT would recommend, which is why I think this example backs up my claim that evolution selects for FDT.

My claim is of a similar type as Caspar's Doing what has worked well in the past leads to evidential decision theory. It's a statement about the resulting policy, not the reasoning steps of the agent. Caspar describes an agent that does the action that has worked well in the past. Evolution is a process that select the policy that has worked well in the past, which should give you effectively some version of FDT-son-of-EDT.

There are situations where FDT behave differently depending on when and why it was created (e.g.). I think I could figure out how this would playout out in the context of evolution, but it would take some more thinking. If you think I'm on the right track, and convince me that this is useful, I'll give it a try.

[-]johnswentworth3y30

This argument sounds roughly right to me, though I'm not sure FDT is exactly the right thing. If two organisms were functionally identical but had totally different genomes, then FDT would decide as though they're one unit, whereas I think evolution would select for deciding as though only the genetically-identical organisms are a unit? I'm not entirely sure about that.

[-]Linda Linsefors3y30

I agree that it's not exactly FDT. I think I actually meant updateless decision theory (UDT), but I'm not sure because I have some of uncertainty to exactly what others mean by UDT.

I claim that mutations + natural selection (evolution) selects for agents that acts according to the policy they would have wanted to pre-commit to, at the time of their birth (last mutation).

[-]Linda Linsefors3y30

Yes, there are some details around who I recognize as a copy of me. In classical FDT this would be anyone who are running the same program (what ever that means). In evolution this would be anyone who are carrying the same genes. Both of these concept are complicated by "same program" and "same genes" are scalar (or more complicated?) and not Boolean values.

Edit: I'm not sure I agree with what I just said. I believe something in this direction, but I want to think some more. For example, people with similar genes probably don't cooperate because decision theory (my decision to cooperate with you is correlated with your decision to cooperate with me), but because shared goals (we both want to spread our shared genes).

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

13

The Big Picture Of Alignment (Talk Part 2)

13