I think the stuff about the supernovas addresses this: a central point is that the “AI” must be capable of generating an arbitrary world state within some bounds.
"(It makes sense that) A proof-based agent can't cross a bridge whose safety is dependent on the agent's own logic being consistent, since proof-based agents can't know whether their logic is consistent."
If the agent crosses the bridge, then the agent knows itself to be consistent.
The agent cannot know whether they are consistent.
Therefore, crossing the bridge implies an inconsistency (they know themself to be consistent, even though that's impossible.)
The counterfactual reasoning seems quite reasonable to me.