I feel like MIRI perhaps mispositioned FDT (their variant of UDT) as a clear advancement in decision theory, whereas maybe they could have attracted more attention/interest from academic philosophy if the framing was instead that the UDT line of thinking shows that decision theory is just more deeply puzzling than anyone had previously realized. Instead of one major open problem (Newcomb's, or EDT vs CDT) now we have a whole bunch more. I'm really not sure at this point whether UDT is even on the right track, but it does seem clear that there are some thorny issues in decision theory that not many people were previously thinking about:
I do think there's a sense in which CDT behavior is evolutionarily selected for in environments where agents can't see each others' decision theories.
I don't see this as a big problem with UDT. If UDT wants to be evolutionarily fit relative to other agents in the environment, then I think they could adopt CDT behavior and do just as well as CDT.
It's just that, due to the virtue of their decision theory (according to themselves), they have the option of giving up evolutionary fitness in exchange for higher utility in the short run. If they care more about s...
This is an experiment in short-form content on LW2.0. I'll be using the comment section of this post as a repository of short, sometimes-half-baked posts that either:
I ask people not to create top-level comments here, but feel free to reply to comments like you would a FB post.
What's the process you're doing right now to look into this? (Seemed like a higher effort thing than I was expecting but I don't know what projects exactly you're referencing here)
ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development.
We're working to make this happen through what we believe is the most natural and promising approach: helping decision-makers in governments and the public understand the risks and take action.
We believe that ControlAI can achieve an international prohibition on ASI development if scaled sufficiently. We estimate that it would take approximately a $50 million yearly budget in funding to give us a concrete chance at achieving this in the next few years.
In this post, we lay out some of the reasoning behind this estimate, and explain how additional funding past that threshold, including and beyond $500 million, would continue...
What fraction of ControlAI’s growth do you think should be in the US? I think at least 90%, and maybe over 100%!
Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of instructions). [2] I disagree.
Current AI systems seem pretty misaligned to me in a mundane behavioral sense: they oversell their work, downplay or fail to mention problems, stop working early and claim to have finished when they clearly haven't, and often seem to "try" to make their outputs look good while actually doing something sloppy or incomplete. These issues mostly occur on more difficult/larger tasks, tasks that aren't straightforward SWE tasks, and tasks that aren't...
I don’t see this happening now.
I agree that it's very unlikely that current AIs are scheming/egregiously misaligned/serious adversaries. (I assume that's what you meant by "happening now".)
I think it is important to invest research in this, but IMO the focus should be on measuring and monitoring rather than mitigating since at the moment there is not much to mitigate and unclear this will change. So the focus should be how do we make sure that we will find out if this changes.
This is a long standing disagreement but:
If everyone in our universe doing acausal trade coordinates, we can sell "cosmic real estate" for monopoly prices
Let's assume that there are many different universes (or Everett branches) that acausally trade.
Some traders won't about "resources in our civ's future lightcone" linearly. As a toy example, the leader of a distant alien civilisation might want to get a statue of themselves in as many different other universes as possible.
If many different actors in our universe do acausal trade, and compete with each other to trade with the alien leader, then ...
A collection of examples of AI systems "gaming" their specifications - finding ways to achieve their stated objectives that don't actually solve the intended problem. These illustrate the challenge of properly specifying goals for AI systems.
Here's my current picture of EDT and UDT.
In situations where EDT agents have many copies or near-copies, an EDT agent operates by imagining that it simultaneously controls the decisions of all those copies. This works very elegantly as long as it optimizes with respect to its prior and (upon learning new information) just changes its beliefs about what people in the prior it can control the actions of. (I.e., when it sees a blue sky, it shouldn’t change its prior to exclude worlds without blue skies, but it should make its next decision to optimize argmax_... (read more)