AI ALIGNMENT FORUM
AF

44
AVoropaev
000
Message
Dialogue
Subscribe

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No posts to display.
More information about the dangerous capability evaluations we did with GPT-4 and Claude.
AVoropaev3y30

What improvements do you suggest?

Reply
Extracting Money from Causal Decision Theorists
AVoropaev5y00

I've skimmed over the beginning of your paper, and I think there might be several problems with it.
 

  1. I don't see where it is explicitly stated, but I think information "seller's prediction is accurate with probability 0,75" is supposed to be common knowledge. Is it even possible for a non-trivial probabilistic prediction to be a common knowledge? Like, not as in some real-life situation, but as in this condition not being logical contradiction? I am not a specialist on this subject, but it looks like a logical contradiction. And you can prove absolutely anything if your premise contains contradiction.
  2. A minor nitpick compared to the previous one, but you don't specify what you mean by "prediction is accurate with probability 0.75". What kinds of mistakes does seller make? For example, if buyer is going to buy the B1, then with probability 0.75 the prediction will be "B1". What about the 0.25? Will it be 0.125 for "none" and 0.125 for "B2"? Will it be 0.25 for "none" and 0 for "B2"? (And does buyer knows about that? What about seller knowing about buyer knowing...)

    When you write "$1−P (money in Bi | buyer chooses Bi ) · $3 = $1 − 0.25 · $3 = $0.25.", you assume that P(money in Bi | buyer chooses Bi )=0.75. That is, if buyer chooses the first box, seller can't possibly think that buyer will choose none of the boxes. And the same for the case of buyer choosing the second box. You can easily fix it by writing "$1−P (money in Bi | buyer chooses Bi ) · $3 >= $1 − 0.25 · $3 = $0.25" instead. It is possible that you make some other implicit assumptions about mistakes that seller can make, so you might want to check it.

     
Reply
No wikitag contributions to display.