This website requires javascript to properly function. Consider activating javascript to get access to all site functionality.
AI ALIGNMENT FORUM
Benchmark Study
AF
Login
Benchmark Study
4
Benchmark Study #1: MMLU (Pile, MCQ)
Bruce W. Lee
4mo
0
1
Benchmark Study #2: TruthfulQA (Task, MCQ)
Bruce W. Lee
4mo
0
-2
Benchmark Study #3: HellaSwag (Task, MCQ)
Bruce W. Lee
4mo
0
1
Benchmark Study #4: AI2 Reasoning Challenge (Task(s), MCQ)
Bruce W. Lee
4mo
0
0
Benchmark Study #5: Social Intelligence QA (Task, MCQ)
Bruce W. Lee
3mo
0