There's a decent amount of literature on using multiple rewards, though often it's framed as learning about multiple goals. Here are some off the top of my head:
The Horde (classic): http://www.ifaamas.org/Proceedings/aamas2011/papers/A6_R70.pdf
Universal Value Function Approximators: http://proceedings.mlr.press/v37/schaul15.html
Learning to Act By Predicting: https://arxiv.org/abs/1611.01779
Temporal Difference Models: https://arxiv.org/abs/1802.09081
Successor Features: https://papers.nips.cc/paper/2017/hash/350db081a661525235354dd3e19b8c05-Abstract.html
Also see the discussion in Appendix D about prediction heads in OpenAI Five, used mostly for interpretability/diagnostics https://cdn.openai.com/dota-2.pdf.
The results in Neural Networks Are Fundamentally Bayesian are pretty cool -- that's clever how they were able to estimate the densities.
A couple thoughts on the limitations:
I might be missing some context here, but I didn't understand the section "No Indescribable Hellworlds Hypothesis" and how hellworlds have to do with debate.
OK, I guess I'm a bit unclear on the problem setup and how it involves a training phase and deployment phase.
Wonderful writeup!
I'm sure you've thought about this, but I'm curious why the following approach fails. Suppose we require the debaters to each initially write up a detailed argument in judge-understandable language and read each other's argument. Then, during the debate, each debater is allowed to quote short passages from their opponent's writeup. Honest will be able to either find a contradiction or an unsupported statement in Dishonest's initial writeup. If Honest quotes a passage and says its unsupported, then dishonest has to respond with the supporting sentences.
I think this is a good idea. If you go ahead with it, here's a suggestion.
Reviewers often procrastinate for weeks or months. This is partly because doing a review takes an unbounded amount of time, especially for articles that are long or confusing. So instead of sending the reviewers a manuscript with a due date, book a calendar event for 2 hours with the reviewers. The reviewers join a call or group chat and read the paper and discuss it. They can also help clear each other's confusions. They aim to complete the review by the end of the time window.