AI ALIGNMENT FORUM
AF

266
Wikitags

AI Benchmarking

This page is a stub.
Subscribe
Discussion
Subscribe
Discussion
Posts tagged AI Benchmarking
21Introducing BenchBench: An Industry Standard Benchmark for AI Strength
Jozdien
7mo
0
16Improving Model-Written Evals for AI Safety Benchmarking
Sunishchal Dev, Marius Hobbhahn
1y
0
5Auto-Enhance: Developing a meta-benchmark to measure LLM agents’ ability to improve other agents
Sam F. Brown, BasilLabib, Codruta (Coco) Lugoj, Sai Sasank Y
1y
0
5MMLU’s Moral Scenarios Benchmark Doesn’t Measure What You Think it Measures
corey morris
2y
2
Add Posts