It seems like what is needed is a logical uncertainty model that allows things that PA would consider contradictions (so we can reason about finite agents outputting something they don't actually output), and that is causal. Some of the models in in Paul's Non Omniscience, Probabilistic Inference, and Metamathematics paper allow contradictions in a satisfying way, but don't contain any causation.
It seems that we want "this agent outputs this" to cause "consequences of this decision happen". Suppose we create a Boolean variable for every logical proposition , and we want to arrange them in something like a causal Bayesian network. Since the consequences of an action are a logical consequence of the agent's output that can be seen within a small number of steps, perhaps we want to arrange the network so that the premises of an inference rule being true will cause their conclusion to be true (with high probability; we still want some probability of contradiction). But if we naively create causal arrows from the premises of an inference rule (such as modus ponens) to its conclusion (for example, allow and to jointly cause ) then we get cycles. I'm not sure if causation is well-defined in cyclic graphs, but if it's not then maybe there is a way to fix this by deleting some of the causal arrows?
Yeah, causation in logical uncertainty land would be nice. It wouldn't necessarily solve the whole problem, though. Consider the scenario
outcomes = [3, 2, 1, None]
strategies = {Hi, Med, Low}
A = lambda: Low
h = lambda: Hi
m = lambda: Med
l = lambda: Low
payoffs = {}
payoffs[h()] = 3
payoffs[m()] = 2
payoffs[l()] = 1
E = lambda: payoffs.get(A())
Now it's pretty unclear that (lambda: Low)()==Hi
should logically cause E()=3
.
When considering (lambda: Low)()==Hi
, do we want to change l
without A
, A
without l
, or both? These correspond to answers None
, 3
, and 1
respectively.
Ideally, a causal-logic graph would be able to identify all three answers, depending on which question you're asking. (This actually gives an interesting perspective on whether or not CDT should cooperate with itself on a one-shot PD: it depends; do you think you "could" change one but not the other? The answer depends on the "could.") I don't think there's an objective sense in which any one of these is "correct," though.
Instead of directly asking "what decision theory should an agent run," consider the question, "given a description of an environment and a description of an agent, identify the best action available to that agent, regardless of its algorithm."
These are pretty much the same question, with slightly different framings. (If you answer the latter, then the answer to the former is "approximate the latter thing on the world model.") The latter question is more general, though, and it highlights different parts of the problem.
For example, it highlights questions about what counts as an agent, how the agent is identified in the environment, how the set of things the agent "could have done" is determined, how the agent's "observations" are identified, and so on. These parts of the question are where decision theory intersects with naturalized induction.
This angle puts the spurious proof problem in a new light. Instead of considering UDT as an algorithm which computes what its best available strategy is, consider a "global" version of UDT which takes any description of an environment and an agent within that environment, and computes the best strategy available to that agent.
Assume away the naturalized problems and say that there is a sorted list of
outcomes
(best first) and a set of availablestrategies
(observation-to-action mappings). Now we can defineGlobalUDT(E, A)
as a function which takes a descriptionE
of the environment, a descriptionA
of the agent, and computes the best strategy available to that agent:(I don't know how to use latex in code blocks, so braces are used as dequotes in strings: if
x="ell"
then"h{x}o"
is the string"hello"
.)For a given universe
U
, Vladimir's proof-based UDT can be recovered by defining the following via quining:And, as it happens,
GlobalUDT
will always identify the actual best strategy available toUDT
. (GlobalUDT
never finds bad spurious proofs, because it iterates through outcomes in order; if there are any spurious proofs then they are spurious proofs that the best strategy is the best strategy.) ButGlobalUDT
does not identify the best strategy for agents in general!Consider the following universe:
That is, there are three options
Hi, Med, Low
corresponding to payoffs3, 2, 1
. The "agent" always selects the actionLow
.GlobalUDT("E", "A")
either returnsHi
orMed
, whichever it considers first, because bothA()=Hi
andA()=Med
are contradictory (and so implyE()=Hi
).This isn't all that surprising (what does it mean to ask what would have happened if
(lambda: Low)() == Hi
?), but it does distill the problem with using logical implications as counterfactuals a bit.GlobalUDT
only identifies the right strategy onUDT
because it happens thatUDT
is something in the environment where knowing what it does requires knowing that PA is consistent, andGlobalUDT
doesn't know that.(But a version of
GlobalUDT
using ZFC would prescribe bad strategies for a PA-based version of UDT in the environment.)Moving beyond proof-based UDT requires a better theory of logical counterfactuals, and
GlobalUDT
provides a slightly different angle of approach than the one I'm used to. Intuitively, a satisfactory theory of logical counterfactuals should give us an algorithm that identifiesHi
as the best strategy available toconst Low
. I'm actually not too convinced that such a thing is possible/meaningful (at least, not without context and better answers to questions like "what's an agent" and "what's an embedding"), but this makes it a bit clearer that when we talk about wanting a full theory of logical counterfactuals we're talking about defining a working version ofGlobalUDT
.Food for thought.