This is a thought-provoking journal article discussing cases where:

  1. You have something with two possible states, A and B, and you can perform fairly accurate, mostly independent tests which distinguish between A and B.
  2. However, there is some failure case with a low prior probability, in which the actual state may be B, but something confounds all your tests and makes them look like they favor A regardless.
  3. If you then perform many tests, and they all favor A, the first few tests will make you update towards A, but then if you keep performing more tests whose results favor A, you will actually start updating towards B, because you will start to think you are in the failure case.

An interesting anecdote which I haven't verified:

Under ancient Jewish law one could not be unanimously convicted of a capital crime—it was held that the absence of even one dissenting opinion among the judges indicated that there must remain some form of undiscovered exculpatory evidence.

Via Hacker News.

New to LessWrong?

New Comment
5 comments, sorted by Click to highlight new comments since: Today at 3:49 PM

The classic example fitting the title, which I learned from a Martin Gardner article (I think he cited it from some 1800s person), is: "Hypothesis: No man is 100 feet tall.  Evidence: You see a man who is 99 feet tall.  Technically, that evidence does fit the hypothesis, but probably after seeing that evidence you would become much less confident in the hypothesis."

Well, basically, it can all be interpreted as having multiple competing theories (e.g. "The tallest human ever was slightly under 9 feet, humans generally follow a certain distribution, your heart probably wouldn't even be able to circulate blood that far, etc." vs "Something very, very weird and new that breaks my understanding is happening") and the evidence can be considered in a Bayesian way.

From the HN comments:

If my test suite never ever goes red, then I don't feel as confident in my code as when I have a small number of red tests.

That seems like an example of this that I have definitely experienced, where A is "my code is correct", B is "my code is not correct", and the failure case is "my tests appear to be exercising the code but actually aren't."

A comment at Hacker News suggests that advertisers have already figured this out, which is why things are always recommended by "4 out of 5 experts".

Or the newer version, "one weird trick", where the purpose of the negative-sounding adjective "weird" is to explain why you haven't heard the trick before, if it's so great.

When apparently positive evidence can be negative evidence

It's when my outgroup talks about the positive evidence, of course.

(just kidding)