EuanMcLean

2023 Alignment Research Updates from FAR AI

TL;DR: FAR AI's science of robustness agenda has found vulnerabilities in superhuman Go systems; our value alignment research has developed more sample-efficient value learning algorithms; and our model evaluation direction has developed a variety of new black-box and white-box evaluation methods. FAR AI is a non-profit AI safety research institute,...

Dec 4, 202318

EuanMcLean

EuanMcLean

Even Superhuman Go AIs Have Surprising Failure Modes

AI Safety in a World of Vulnerable Machine Learning Systems

What's new at FAR AI

2023 Alignment Research Updates from FAR AI

EuanMcLean

Even Superhuman Go AIs Have Surprising Failure Modes

AI Safety in a World of Vulnerable Machine Learning Systems

What's new at FAR AI

2023 Alignment Research Updates from FAR AI

2023 Alignment Research Updates from FAR AI

What's new at FAR AI

Even Superhuman Go AIs Have Surprising Failure Modes

AI Safety in a World of Vulnerable Machine Learning Systems