It is quite possible that CLIP “knows” that the image contains a Granny Smith apple with a piece of paper saying “iPod”, but when asked to complete the caption with a single class from the ImageNet classes, it ends up choosing “iPod” instead of “Granny Smith”. I’d caution against saying things like “CLIP thinks it is looking at an iPod”; this seems like too strong a claim given the evidence that we have right now.

Yes, it's already been solved. These are 'attacks' only in the most generous interpretation possible (since it does know the difference), and the fact that CLIP can read text in images to, arguably, correctly note the semantic similarity in embeddings, is to its considerable credit. As the CLIP authors note, some queries benefit from ensembling, more context than a single word class name such as prefixing "A photograph of a ", and class names can be highly ambiguous: in ImageNet, the class name "crane" could refer to the bird or construction equipment; and the Oxford-IIIT Pet dataset labels one class "boxer".

Reply

[-]Rohin Shah5y50

Ah excellent, thanks for the links. I'll send the Twitter thread in the next newsletter with the following summary:

Last week I speculated that CLIP might "know" that a textual adversarial example is a "picture of an apple with a piece of paper saying an iPod on it" and the zero-shot classification prompt is preventing it from demonstrating this knowledge. Gwern Branwen [commented](https://www.alignmentforum.org/posts/JGByt8TrxREo4twaw/an-142-the-quest-to-understand-a-network-well-enough-to?commentId=keW4DuE7G4SZn9h2r) to link me to this Twitter thread as well as this [YouTube video](https://youtu.be/Rk3MBx20z24) better prompt engineering significantly reduces these textual adversarial examples, demonstrating that CLIP does "know" that it's looking at an apple with a piece of paper on it.

Reply

[-]Rohin Shah5y50

Related: Interpretability vs Neuroscience: Six major advantages which make artificial neural networks much easier to study than biological ones. Probably not a major surprise to readers here.

Reply

Moderation Log

AI ALIGNMENT FORUM
AF

AI ALIGNMENT FORUM
AF

21

[AN #142]: The quest to understand a network well enough to reimplement it by hand

21

FEEDBACK

PODCAST