scientific reproducibility police

At coffee this morning, Christopher Stumm (Etsy), Dan Foreman-Mackey (NYU), and I worked up the following idea of Stumm's: Every week, on a blog or (I prefer) in a short arXiv-only white paper, one refereed paper is taken from the scientific literature and its results are reproduced, as well as possible, given the content of the paper and the available data. I expect almost every paper to fail (that is, not be reproducible), of course, because almost every paper contains proprietary code or data or else is too vague to specify what was done. The astronomical literature is particularly interesting for this because many papers are based on public data; for those it comes down only to code and procedures; indeed I remember Bob Hanisch (STScI) giving a talk at ADASS showing that it is very hard to reproduce the results of typical papers based on HST data, despite the fact that all the data and almost all the code people use on them are public.

Stumm, Foreman-Mackey, and I discussed economic models and incentive models to make this happen. I think whoever did this would succeed scientifically, if he or she did it well, both because it would have huge impact and because it would create many new insights. But on the other hand it would take significant guts and a hell of a lot of time. If you want to do it, sign me up as one of your reproducibility agents! I think anyone involved would learn a huge amount about the science (more than they learn about reproducibility). In the end, it is the community that would benefit most, though. Radical!


  1. I'm glad Gus pointed out the Reproducibility Initiative! I started the initiative - I am actually going to be at NYU giving a seminar on Friday, 9/21 at 4:30PM (Smilow Seminar Room, Langone Medical Center, 550 First Avenue, New York NY) - it would be great to meet if you have time.

  2. How do you think it would create new insights? Most papers make incremental contributions to the literature. In fact most papers are at least a little bit wrong. If we tried reproducing results we might learn what fraction of papers are completely wrong, but I don't know if that fraction is incredibly interesting - it's a nuisance parameter, like the outlier fraction. The march of science [tm], or alternatively Kuhnian "normal science," doesn't really depend on individual papers being correct, as long as the global average tends to converge. So far as there is an average. Naturally, there are individual papers whose wrongness would have greater consequences.

    I think it's more interesting to know what papers cannot be reproduced using different data and similar, but not line-for-line exactly the same, techniques. This tells you something about systematics. For example in the time-varying fine structure constant papers, they repeatedly got consistent answers with the Keck spectra. When they analyzed a bunch of VLT spectra, they got a different answer. They published this as evidence for a N/S dipole, but of course the sane interpretation (if you haven't staked your career on it) is that the method has unknown systematics and the result is null.