This looks really interesting to me. I remember when the Safety via Debate paper originally came out; I was quite curious to see more work around modeling debate environments and getting a better sense on how well we should expect it to perform in what kinds of situations. From what I can tell this does a rigorous attempt at 1-2 models.

I noticed that this is more intense mathematically than most other papers I'm used to in this area. I started going through it but was a bit intimidated. I was wondering if you may suggest tips for reading through it and und

I guess on first reading, you can cheat by reading the introduction, Section 2 right after that, and the conclusion. One level above that is reading the text but skipping the more technical sections (4 and 5). Or possibly reading 4 and 5 as well, but only focusing on the informal meaning of the formal results.
Regarding the background knowledge required for the paper: It uses some game theory (Nash equilibria, extensive form games) and probability theory (expectations, probability measures, conditional probability). Strictly speaking, you can get all of this from looking up whichever keywords on wikipedia. I think that all of the concepts used there are basic in the corresponding fields, and in particular no special knowledge of measure theory is required. However, I studied both game theory and measure theory, so I am biased, and you shouldn't trust me. (Moreover, there is a difference between "strictly speaking, only this is needed" and "my intuitions are informed by X, Y, and Z".)
Another thing is that the AAAI workshop where this will appear has a page limit, which means that some explanations might have gotten less space than they would deserve. In particular, the arguments in Section 4 are much easier to digest if you can draw the functions that the text talks about. To understand the formal results, I think I visualized two-dimensional slices of the "world space" (i.e., squares), and assumed that the value of the function f is 0 by default, except for being 1 at some selected subset of the square. This allows you to compute all the expectations and conditionals visually.

This looks really interesting to me. I remember when the Safety via Debate paper originally came out; I was quite curious to see more work around modeling debate environments and getting a better sense on how well we should expect it to perform in what kinds of situations. From what I can tell this does a rigorous attempt at 1-2 models.

I noticed that this is more intense mathematically than most other papers I'm used to in this area. I started going through it but was a bit intimidated. I was wondering if you may suggest tips for reading through it and und

... (read more)