Adversarial Examples

Adversarial examples are situations that have unusual features that will cause an AI to make choices that seem obviously wrong to a human. For example, an image of a panda can be subtly manipulated so that an image classifier classifies it as a gibbon.

Applied to The Goodhart Game by John Maxwell at 2y
Created by John Maxwell at 2y