I've noticed that the word "stipulation" is a pretty good word for the category of claims that become true when we decide they are true. It's probably best if we try to broaden its connotations to encompass self-fulfilling prophesies than it is to make some other word or name this category "prophesy" or something.

It's clear that the category does deserve a name.

2Abram Demski2yI guess 'self-fulfilling prophecy' is a bit long and awkward. Sometimes 'basilisk' is thrown around, but, specifically for negative cases (self-fulfilling-and-bad). But, are you trying to name something slightly different (perhaps broader or narrower) than self-fulfilling prophecy points at? I find I don't like 'stipulation'; that has the connotation of command, for me (like, if I tell you to do something).
He thinks that as AI systems get more powerful, they will actually become more interpretable because they will use features that humans also tend to use

I find this fairly persuasive, I think. One way of putting it is that in order for an agent to be recursively self-improving in any remotely intelligent way, it needs to be legible to itself. Even if we can't immediately understand its components in the same way that it does, it must necessarily provide us with descriptions of its own ways of understanding them, which we could then potentially co-opt. ... (read more)

Hmm. I don't think I can answer the question, but if you're interested in finding fairly realistic ways to dutchbook CDT agents, I'm curious, would the following be a good method? Death in damascus would be very hard to do IRL, because you'd need a mindreader, and most CDT agents will not allow you to read their mind for obvious reasons.

A game with a large set of CDT agents. They can each output Sensible or Exceptional. If they Sensible, they receive 1$. Those who Exceptional don't get anything in that stage

