[Epistemic status: Three different data sets pointing to something similar is at least interesting, make your own mind up as to how interesting!]

Follow-up to: Fight Me, Psychologists, Birth Order Effects are Real and Very Strong, 2012 Survey Results, Historical mathematicians exhibit a birth order effect too

In Eli Tyre’s analysis of birth order in historical mathematicians, he mentioned analysing other STEM subjects for similar effects. In the comments I kinda–sorta preregistered a study into this. Following his comments I dropped the age requirement I mentioned as it no longer seemed necessary.

I found that Nobel Laureates in Physics are more likely to be firstborn than would be expected by chance. This effect (10 percentage points) is smaller than the effect found in the rationalist community or historical mathematicians (22 and 16.7 percentage points respectively) but is significant (p=0.044).

More brothers were found in the study then sisters (125:92 (58%)). After correcting for the correct expected ratio (~52%) this was found to not be significant (p=0.11).

I was unable to find sufficient data on Fields medal, Abel prize and Turing award winners.

My data and analysis is documented here. With Eli's kind permission I used his spreadsheet as a template. I have kept Eli’s data on the same Table – rows 4-153 are his.

Methodology

My methods matched Eli’s closely except for the data sets I looked at, see his post for more information.

Initially I attempted to replicate Eli’s results in other mathematicians by analysing Fields medal and Abel prize winners. Unfortunately I was unable to gather sufficient additional data. This is partly due to crossover in names between these mathematicians and the list from which Eli was working.

It also seems to be the case that less biographical information is available for people born after ~1950. This might be partly due to these people and their siblings being more likely to be still alive so data protection rules prevent e.g. geni from listing their full details (siblings’ details are often set to “private”) but there could be other reasons. For Fields medals awarded before 1986 I found data on 12/30 recipients, after that only 3/30.

I had a brief look at Turing award winners, as this would have seemed a relevant field to compare to the results from the rationalist community that inspired the studies, but came across the same problem.

Finally, I looked at Nobel laureates in Physics. A massive help in data collection here was the fact that since the 1970s Nobel laureates have been asked to supply an autobiography, which is published on the Nobel website. Even before then there are biographies of each laureate although these seldom mention birth order.

Between the Nobel site, Wikipedia and geni I was able to find useful data on 100/207 Physics laureates. The other 107 either had no siblings or I couldn’t find sufficient data on them – either way they weren’t included in the analysis.

As a comment on data sources, I found geni to be somewhat unreliable. It contradicted the autobiographies or sometimes even contradicted itself. At other times, the list of siblings was incomplete or missing completely.

Results

Categorising by family size shows that for all family sizes with ≥10 data points there are more firstborns than would be expected by chance.

Due to small sample size I have grouped all families of 6+ siblings into a single bucket and even then n=14. Expected birth order then varies with higher birth order as there are fewer families in the sample with at least that many children.

Analysing the data as a whole gives a 10 percentage point effect (0.2 to 19.8 percentage points, 95% confidence). This is less than both the SSC / Less Wrong surveys and Eli’s historical mathematicians analysis (22 and 17 percentage point respectively). I haven’t got a number for overall confidence level for the SSC data but due to the large data set and very low p quoted for the 2 sibling example, it is unlikely that the 95% confidence interval overlaps with this new data, suggesting that the effect is truly a different size and not due to chance.

Discussion

Autobiographies as source material

Using autobiographies as the source for a significant number of the data points should have helped with the reliability of the data. It is possible that when writing an autobiography one would be more likely to mention siblings and birth order if one was the eldest but this doesn't seem likely.

Gender imbalance

Eli discussed under reporting of females as a potential source of bias. However, he found that the brothers:sisters ratio in his data was not unreasonable.

Running the same analysis on the physics Nobel laureate data I get a ratio of 125:92 brothers:sisters. This makes the siblings 58% male, with p=0.03 (binomial distribution, two tailed). This effect is actually more significant than the birth order effect.

Looking at the SSC data and Eli’s data and found that there were 52% brothers in both. I did a little research and found that actually 51-52% is roughly the expected brother:sister ratio. I feel like this is something I should have already known but didn’t.

Another effect which might increase the proportion of Nobel laureates brothers is that men can have a disposition to have boys or a disposition to have girls. As almost all of the laureates are male it would be reasonable to think more of their Dads were predisposed to having boys. However as this isn’t seen in SSC or historical mathematicians data (both also male dominated) this doesn’t really get us much further.

Using 52% as the expected ratio (instead of 50%) means that the 58% result from Nobel laureates no longer rises to significance (p=0.11) and should instead be labelled as “hey, look at this interesting subgroup analysis” or possibly “slightly odd but not implausible”.

As I mentioned previously, most of the data since the 1970s Nobels is based on autobiographies. Looking at only data since then, the brother:sister ratio is 51:35 (59%). It seems unlikely that Nobel laureates forgot about some of their sisters, making it less likely that the gender imbalance is due to incorrect data.

One potential source of error in the gender balance may be in the siblings whose gender I was unable to determine. There were 50 of these. Most (41) of these came from families where I had no data except the number of siblings and the position of the laureate within the family (e.g. “I was the fourth of five children.”). It is possible that some of the missing sisters are in this category.

However, this would imply that if someone has more brothers they are more likely to list the genders of their siblings than if they have more sisters. Perhaps as most of the laureates were male they might have had more in common with brothers and spend more time with them, making them statistically more likely to mention their brothers’ gender. This seems plausible but unlikely to cause a big effect even if it were true.

For the moment, I am working with the assumption that the sample is accurate and that the gender imbalance is just an outlier. Any other thoughts on causes of bias are welcome. These would have to explain how this effect was seen both in data from both geni and the laureates’ autobiographies.

Conclusion

Nobel laureates in physics exhibit a birth order effect such that they are 10 percentage points more likely to be the eldest child than would be expected (p=0.044). This effect is less than data from both SSC readers and historical mathematicians (22 and 17 percentage points respectively).

There was a gender imbalance between brothers and sisters (58% brothers) but, taking into account the expected ratio of 52%, this was not significant (p=0.11). This effect is not seen in SSC readers or historical mathematicians (52% in both)

I would recommend that anyone who wishes to collate additional historical data consider Nobel laureates in other awards due to the availability of accurate data from the autobiographies. My analysis took perhaps 12 hours but a lot of that was spent on wild goose chases in looking for data on Fields medal and Abel prize recipients. I saved a lot of time by reusing Eli’s spreadsheet (thanks for the permission). I would estimate getting data on the entire history of another Nobel prize category and analysing it would take ~6-8 hours so it shouldn’t be too daunting for someone to take on.