Introduction:

Leech and Rogers-Smith replied to my critique of their paper which claimed a 25% reduction in R0. I believe that my points remain valid, and will explain why I am not convinced by their rebuttal.

I would like to commend them on their work, and particularly their data and code availability. With relatively little difficulty I was able to replicate their results on my own system, and their code was easy enough to understand that it was possible to understand how their results were achieved.

Measuring Mask Effectiveness:

When studies try to measure the effectiveness of masks they tend to measure one of two things. Some studies will try to measure a change in the number of infections. In the Bangladesh RCT on mask effectiveness they showed that a 30% increase in the number of people wearing masks led to a 5-10% decrease in the number of cases and extrapolate that an 100% increase would lead to a 15-30%[1] decrease in the number of infections.

Other studies, like the one being analyzed here, try to measure a change in the rate of transmission due to masks. Leech and Rogers-Smith find an effectiveness of ~25% for an 100% increase in mask wearing. At first glance these numbers seem approximately equivalent and consistent with each other. However, a reduction in transmission should compound, if after one cycle of transmission infections are reduced by 10%, then after two cycles they should be reduced by 20% and so on.

Using conservative estimates from this paper, a 30% increase in masks wearing should lead to an ~8% reduction in transmission[2].

If that were the case then over the course of 3 months the treated villages should have had 50% fewer infections rather than the 5-10% actually observed.

Leech and Rogers-Smith Model:

Leech and Rogers-Smith use a sophisticated Bayesian model to estimate the effect of masks, mobility and NPIs on transmission. This model is powerful and allows for a computation of confidence intervals and p values for the various effects. However it is not easy to understand why it produces the results it does.

We can instead think of the model as taking place in two stages. First we estimate a transmission factor based on the observed cases and deaths. For the purposes of this analysis we will assume that this estimate is perfect. Secondly we try to learn a linear model fitting the known mask wearing, mobility and NPIs for each region and day to that actual transmissibility.

All of the graphs and data below is tracked in this spreadsheet

Simple Linear Model:

The most basic model that we can learn is to simply measure the correlation between the mask wearing rate on a given day and the transmissibility.

Plotting this shows that there is a minimal relationship, going from 0-100% mask wearing would result in a mean 5% reduction in transmissibility.

This mirrors the critique from my original article, regions with higher mask wearing have higher regional factors which cancel out most of the claimed benefits of mask wearing.

Relative Linear Model:

Leech and Rogers-Smith use a different analysis to try and deal with the issue of regional confounders. Rather than measuring the absolute level of mask wearing and the absolute transmission rates, we can instead measure the change in mask wearing and seeing how it correlates with the change in transmission.

This shows a much higher connection, going from 0-100% mask wearing would result in a mean 30% reduction in transmissibility. Doing a similar thing for mobility and NPIs shows that the reductions estimated by this model, match closely to those produced in the original paper.

This explains why normalizing the mask wearing relative to the study start, as Leech and Rogers-Smith suggest, shows a similarly high effectiveness. Since it does not impact the relative measurements the claimed effectiveness is as high (slightly higher as the Bayesian model doesn't need to explain the negative evidence from the start of the study).

Why We Should Conclude Mask Wearing Protects:

Since this is a strictly observational study, it is impossible to prove causation. However a strong and robust enough correlation gives us Bayesian evidence in favor of a causal link existing. Leech and Rogers-Smith argue that their paper gives such strong evidence. There are two major lines which they believe strenghthen the causal claim.

First, they have demonstrated that their correlation is robust to a wide variety of variations on the model. The null hypothesis of there being no relationship betweeen masks and growth is implausible. Therefore it is more reasonable to assume a causal relationship, than to speculate on a non-causal relationship which is similarly robust.

Secondly, we have other sources of evidence that this causal relationship exists (both clinical and mechanical). Given that the correlation from this paper fits well with these other sources of data, why should we try and assume a more complex causal mechanism?

Why We Should Not Conclude Mask Wearing Protects:

I am still unconvinced by this reasoning. There are multiple reasons to think that this study should not significantly raise our confidence in the effectiveness of masks. I am not scientifically qualified to assess the full literature so will not comment on what the prior apart from this paper should be.

The Correlation is Weak:

Although looking at the weekly changes shows a mean 30% reduction in growth rate associated with masks, this only explains 2% of the weekly variation in growth rates.

In fact most of the variation in growth rates is unexplainable in the model. On average the unexplained change in transmission from week to week is equivalent to a 33% change in mask wearing.

Factor R^2
Masks .018
NPIs .040
Mobility .127
Unexplained .844

If we take the global average of transmission over time. We can see that almost all of the change is explained by these regional factors.

Failure of Correlation at Absolute Levels:

Leech and Roger-Smith are not concerned with the failure for absolute levels of mask wearing to correlate with growth rates.

According to them, it is plausible that in places with a high propensity for spread (indicated by a high regional factor) mask wearing was more common. This serves to mask the underlying true causation (High Region Spread -> High Mask -> Lower Spread).

I am not convinced by this claim for two major reasons.

First, there does not appear to be a strong correlation between infections at the start of the studied period and the mask wearing.

Second, the regional factors are much more variable than this explanation would credit.

Within a month the regional factors have already diverged significantly from their initial value. This undercuts the claim that the regional factor represents a true inherent susceptibility to infection for a region. This means that the fact that regional factors cancel out differences in mask wearing is surprising and should be evidence against their hypothesis.

Failure to Account For Natural Immunity:

Leech and Roger-Smith are explicitly trying to model the effect of masks on transmission. We should have an extremely strong prior that the prior infection rate strongly impacts transmission. However their model does not account for it in any way. I don't have evidence that this would impact the estimate for mask effectiveness, but does indicate that the model may be overly simplified.

Unexplained Control System:

One of the ongoing mysteries has been the observed negative feedback loop betweeen present and future growth rates. This has had the result that both spikes and declines are mysteriously shorter than expected.

This model similarly leaves this feedback loop unexplained, almost all of this effect is controlled by the change in regional factor (indicating unmodelled factors). Note that this feedback loop can explain 5x more of the variance than masks can (and comparable to what is explained by mobility).

Similarly I don't have evidence that accounting for this would impact the estimate for mask effectiveness. But it does indicate that there is an unmodelled causal pathway between levels of growth rate and the next week's change. If this pathway also impacts mask wearing that could explain the correlation without there being a direct link.

Model Quibbles:

In this section are minor issues I had with the model specification. I don't believe any of them would make a major difference to the results, but might be worth exploring.

Random Walk vs. Random Variation:

In the model we model the growth rate during a week as being the sum of the known factors (mask, mobility, NPI) and a regional factor. If growth during a particular week is higher than expected we react by adjusting our regional baseline for the future.

However if we observe that the regional factor for a week has changed we should infer that some of that change is due to transient factors, which are unlikely to impact future weeks.

Weekend Mobility Variation:

As we would expect there is higher mobility during weekends than weekdays

We can see from a random country's growth rate chart that the model infers higher transmission on weekends than weekdays

I believe that the case and death data does not admit enough precision to make this inference. Depending on the specification this could cause the estimate of the effectiveness of mobility reductions to be inaccurate. I would suggest smoothing the mobility data to avoid these spikes.

Start of Data Effects:

For the first two weeks of model specification the region effects are fixed to the base value. As soon as they are able to adjust we see a massive jump in modelled growth rate

When I looked at the code I was unsure if the data from that initial period was being used to compute the effectiveness of various interventions. If it is being used it probably needs more examination.


  1. For the sake of honesty I did bet with the authors on the eventual results of the study and lost. However I also predicted that the median estimate for cloth masks would be 15%. I was overly pessimistic on surgical masks however so the study did move my priors slightly in the direction of mask effectiveness. ↩︎

  2. The two studies measure mask wearing in slightly different ways, Leech and Rogers Smith measure surveyed compliance with mask wearing, which in Bangladesh was ~80% and the RCT directly observed actual mask wearing rates which went from 10% in the control villages to 40% in the treated villages. It seems reasonable however that a 30% increase in actual mask wearing should have at least as large an effect as a 30% increase in claimed mask wearing. ↩︎

New Comment
7 comments, sorted by Click to highlight new comments since: Today at 10:39 AM

Within a month the regional factors have already diverged significantly from their initial value. This undercuts the claim that the regional factor represents a true inherent susceptibility to infection for a region. This means that the fact that regional factors cancel out differences in mask wearing is surprising and should be evidence against their hypothesis.

What does this divergence have to do with whether the "High Region Spread -> High Mask -> Lower Spread" model is true? Couldn't that model be equally well true for any exogenous "region spread" factor, regardless of whether it's changing month-to-month?

They are trying to explain the surprising fact that countries with high levels of mask wearing have correspondingly high "region spread" factors which cancel it out.

Their explanation is that this is because the regions most inherently susceptible to COVID-19 rationally respond by taking more protective measures (such as higher levels of mask wearing).

My point with the variance of the regional factor is that this makes it more likely that "region spread factor" is another term for "prediction error" rather than "inherent susceptibility".

(Just responding to the title...) This post isn't making any claims about the effectiveness of respirators, right? (e.g. N95/FFP2/FFP3)

I don't have specific knowledge about N95 quality masks. The Bangladesh RCT found that surgical masks were about 2x as effective as cloth masks (although that difference was on the edge of statistical significance). If I had to guess an equivalent RCT with N95s would find them to be ~2x as effective as surgical masks. But this post is mainly talking about masks = what average people wear and call masks.

If I had to guess an equivalent RCT with N95s would find them to be ~2x as effective as surgical masks.

(My best guess: The difference is much more dramatic than that because non-sealing masks allow unfiltered air to leak from the edges; so the amount of aerosols that gets inhaled while wearing a regular mask can easily be 2 orders of magnitude larger than when wearing an N95/FFP2/FFP3 respirator.)

Over here in Germany, average people are wearing N95/KN95/FFP2's and I know a few people who somehow got their hands on some N99/FFP3's. If cloth masks and surgical masks are not good enough, we should acknowledge that and get better ones. Many people would try to claim that as proof that masking doesn't work and shouldn't be done.

Getting better masks isn't enough. You actually need to wear them correctly to get the full benefits and even professionals often fail fit testing.