x

AI ALIGNMENT FORUM

AF

Sunishchal Dev

Sunishchal Dev

Message

58

Ω

21

2

3

2y

Sunishchal Dev

58

Ω

21

2y

Sunishchal Dev — AI Alignment Forum

Improving Model-Written Evals for AI Safety Benchmarking

This post was written as part of the summer 2024 cohort of the ML Alignment & Theory Scholars program, under the mentorship of Marius Hobbhahn. Abstract As model-written evals (MWEs) become more widely used in AI benchmarking, we need scalable approaches to assessing the quality of the eval questions. Upon...

Oct 15, 2024•30