Friday, March 8, 2013

Rethinking Retractions

This time last year I had to retract a paper I had recently published as a result of a coding error that invalidated the analysis. Now, one year later, a revised version on the same paper is about to be re-published, complete with an analysis of ALL THE DATA. Broadly speaking the results are the same as before, the methodology is still pretty novel, my career may just about have been salvaged from last years wreckage. With the whole episode (hopefully!) behind me, I wanted to write an article reflecting on the experience. While my view is obviously coloured by my own difficulties, I hope it will have some relevance to other scientists as well.

UPDATE: The relevant re-published paper is now available here


Almost exactly one year ago I started this blog. I was also down in Australia, about to give a seminar entitled 'Prawns and Probability: Model Selection in Collective Behaviour'. The centre point of the presentation was going to be the work I had recently published on identifying interaction rules in groups of prawns. As the name of this blog suggests, it was also going to be the subject of one of the early posts here too. Having spent about 18 months on the research and writing before getting it published I was feeling pretty pleased with myself...

Back in the UK my friend and colleague from my PhD days, Michael Osborne, was also playing around with the prawn data, after I had passed it on to him as a nice example data set for him to test his numerical integration methods on. There was hope we might get a nice conference paper out of it. All was well with the world.

One night while I was finishing up my presentation slides I had a message from Mike come up on the computer. The conversation that followed is still starred in my gmail:

                    Michael: hey rich
 Michael: it's going ok, I hope
  hey are you ready for some news
11:46 me: bring it
 Michael: dave reckons you only used 1/100th of the data in the .m files you sent us
  rather than 1/2 as it seems you intended
  basically just data from a single trial
 me: ...
  um, ok
11:47 what leads you/him to this conclusion?
 Michael: well, looking at the code
  our evidences approximately match yours on the 1/100th dataset
11:48 it's actually good news for us, because running on the whole dataset is crazy slow
  which allows us to make the argument that choosing samples is important
 me: ok, but how did i manage to only pull out 1/100th?
11:49 is it just 1:100:end?
  or are you only goijg on the evidences?
11:50 Michael: David: so there is something weird about the scripts we got

it seems like there is a bug that means that only 1/100th of the data is used
instead of 1/2 like they meant

me: ha

David: lines 12-15 of

so prawn_MC_results_script just hands it these cell arrays

and then it divides them in half 100 times

I think the code is supposed to just take every second row

but it takes every second cell, and does this over and over


In the 5 minutes it took to have that conversation my mood went from buoyant to despairing. Mike and his colleague David Duvenaud had found an error in the code I had used to analyse the data in our paper which had in a stroke invalidated all our results. This had a number of extremely unpleasant repercussions.

  • I was now due to give an hour long seminar in ~3 days that focused on some completely false results.
  • The paper I had been writing with Mike and David was now floundering without a data set, and my contribution had been wiped out
  • The blog I had started had nowhere to go (hence the lack of posts over the last year!)
  • Worst of all: I had to tell my co-authors on the original paper that our results were invalid, that we would have to retract the paper and that it was ALL MY FAULT for not checking the code well enough.
I won't bore you with the exact details of the next few weeks. Suffice to say I had a very drunk Skype conversation with my boss who was very good about the whole thing, I somehow gave a successful seminar despite having "CAUTION, POSSIBLY INVALID" over my most important results, and after crafting an extremely apologetic statement the paper was retracted. Mike and David found other data to play with. I wrote about some older topics in my blog. I didn't sleep very much for a few months.

The general unpleasantness of the whole experience led me to reflect on the nature of retractions and mistakes in the scientific literature. My conclusions:

  • Mistakes like this must be relatively common. I may not be the most thorough person in the world, but I am far from the most careless. I made a similar mistake during my PhD but caught it shortly before publication (another few sleepless months there...). Any work that involves a lot of involved computational analysis of data by one or a few people must have a small but significant chance of including a coding error. Many of these doubtless have little impact on the results, but some will.
  • My mistake was only caught because I gave my code to someone else. While this now makes it terrifying to do so every again, it also shows the value in journals insisting on code being made available. Given how involved some analysis of large data sets is becoming it is implausible to expect anyone to replicate your results without seeing your own code. The chances of peer-review catching this sort of error are somewhere between very small and non-existent.
  • The business of retracting a paper is far too stigmatising. To be fair to PLoS, who published the paper, they were extremely good about the retraction and certainly didn't accuse me of anything underhand. Nonetheless, most peoples' first reaction on hearing I was retracting a paper was similar to Andrew King's: "Retracted? What did you do?" (actually, Andrew was very nice about it too, but his was the only reaction I had in writing!). Many other people gave me a there-but-for-the-grace-of-God-go-I look and said how awful it must be. Most of the stigma I felt came from the wider community who did not know me personally, but knew of websites like retraction watch, which, while aiming to shame fraudulent scientists also gives all retractions a a bad name.
Now I understand that retractions have often been associated with gross mispractice. Some successful scientists have been made a retract whole careers worth of publications after it was discovered they had been intentionally falsifying data. Of course this sort of thing needs to be stamped out as vigorously as possible.


If mistakes are common (my assertion), and retracting a paper is awful (my experience), that seems like a recipe for encouraging cover ups and quietly ignored errors in the literature. I am not ashamed to say that the night I found out about my mistake I was initially tempted to ignore it. I'm glad now that I didn't, but a the time the little devil on my shoulder was trying to persuade me it wasn't worth the ensuing misery to correct an error in one paper among thousands, that the methodology was still sound, that it wasn't that big a deal. And therein lies the problem - a mistake in the literature seems like a small thing. In contrast, retracting a paper seems like a huge thing. Especially as it wiped from the record some of the work I was most proud of when my publication list was already a little sparse.

In conclusion, based on my experience I think there should be an easier, less painful and less stigmatising way to admit to serious but non-fraudulent errors in published work. The type of mistake I made will, I believe, only become more common. It doesn't mean everything the authors did was wrong, nor does it necessarily imply that there was any foul play. We should also advocate for making data and code publicly available alongside published work so more mistakes can be picked up before they have a chance to become accepted results. More should be done to create a system for distinguishing between mistakes and fraud.

Meanwhile, my advice to other scientists doing similar work to me is:

a) Don't trust the results in papers as being revealed truth just because they are peer-reviewed. 
b) I'm not going to tell you to 'do the right thing' if you find a mistake in your own (published) paper. Just be aware you might not be able to sleep until you do!
c) Go and check your code again now!

Please get in touch if you have had any similar experience, or if you disagree about anything I've written here. I'd be glad to know how people in the scientific community feel about these issues.


  1. The bug was extremely subtle, since all the downstream plots looked just fine. There was no way to spot it just from looking at the code of any one file - I only spotted it by accident when I noticed that the cached data was much smaller than the original.

    I've been working on a set of sanity checks to catch errors of this sort automatically, but I haven't been able to come up with one that would have caught this kind of error.

    At the time it was obvious that this was going to be a painful experience, and hard to handle as gracefully as you did. Kudos for bringing to light such a sensitive aspect of research.

  2. It was a lovely paper, regardless. And I'm glad it turned out well! Maybe selfishly I'm glad other people have similar problems with code. I've had sleepless nights and days with false alarms, and not-false alarms, with an ABC analysis we've been working on.

  3. Incidentally, insprired by your prawn paper, we're applying a machine learning algorithm with a Markov Chain model to clean up errors in a Drosophila group-choice video tracking experiment. One broad class of bug-checks we're rigorously implementing, which might or might not have been useful in your particular case, is to simulate data using a model, then use the algorithm on the simulalted data to see if we recover the same model at the end. This kind of sanity check has uncovered more bugs than I care to admit.

  4. Thanks for your comment Brad. You're right, making sure you can infer back from simulations is definitely a good sanity check - I've caught many other bugs that way! Turns out you can even write a whole paper just doing that!

    One problem that might come up though is if you share some code between sims and inference - for example I have code which calculates which animals are in each others `neighbourhood', which is often used in both.