1y02

Interesting. This dataset could be a good idea of hackathon.

Is there an online list with this type of datasets of interest for alignment? I am trying to organize a hackathon, I am looking for ideas

21y

A related dataset is Waterbirds, described in Sagawa et al (2020)
[https://arxiv.org/pdf/1911.08731.pdf], where you want to classify birds as
landbirds or waterbirds regardless of whether they happen to be on a water or
land background.
The main difference from HappyFaces is that in Waterbirds the correlation
between bird type and background is imperfect, although strong. By contrast,
HappyFaces has perfect spurious correlation on the training set. Of course you
could filter Waterbirds to make the spurious correlation perfect to get an
equally challenging but more natural dataset.

Ah, "The tool-assisted attack went from taking 13 minutes to taking 26 minutes per example."

Interesting. Changing the in-distribution (3oom) does not influences much the out-distribution (*2)

31y

I think that 3 orders of magnitude is the comparison between "time taken to find
a failure by randomly sampling" and "time taken to find a failure if you are
deliberately looking using tools."

I do not understand the "we only doubled the amount of effort necessary to generate an adversarial counterexample.". Aren't we talking about 3oom?

41y

Ah, "The tool-assisted attack went from taking 13 minutes to taking 26 minutes
per example."
Interesting. Changing the in-distribution (3oom) does not influences much the
out-distribution (*2)

This model is a proof of concept of powerful implicit mesa optimizer, which is evidence towards "current architectures could be easily inner misaligned".