Maybe not fully understanding, but one issue I see is that without requiring "perfect prediction", one could potentially Goodhart on on the proposal. I could imagine something like:
In training GPT-5, add a term that upweights very basic bigram statistics. In "evaluation", use your bigram statistics table to "predict" most topk outputs just well enough to pass.
This would probably have a negative impact to performance, but this could possibly be tuned to be just sufficient to pass. Alternatively, one could use a toy model trained on the side that is easy to ... (read more)
I suspect that with a tuned initial prompt that ChatGPT would do much better. For example, something like:
Simulate an assistant on the other end of a phone call, who is helping me to cook a turmeric latte in my kitchen I have never cooked before and need extremelly specific. Only speak one sentence at a time. Only explain one instruction at a time. Never say "and". Please ask clarifying questions if necessary. Only speak one sentence at a time, and await a response. Be explicit about:
- where I need to go.
- what I need to get
- where I need to bring thi