(Explanation. Also I have no reason to think they hate me.)
Do not use the original TruthfulQA multiple-choice or the HaluEval benchmark. We show that a simple decision tree can theoretically game multiple-choice TruthfulQA to 79.6% accuracy—even while hiding the question being asked! In response, the TruthfulQA authors created a new multiple-choice condition which avoids the vulnerabilities we highlight.
https://turntrout.com/original-truthfulqa-weaknesses