Follow (Stata user) Nate Silver's Example: Make your analysis reproducible

Nate Silver's FiveThirtyEight blog has featured the most-read Stata output in the world this election season. (Before he moved to the New York Times it was not unusual to see raw Stata output in his blog, though that seems to have changed now that he has professional graphic designers available.) So how does he do it? Unfortunately, to the best of our knowledge he's never described his use of Stata in detail, but we can make some inferences. The procedure he uses is quite complex, yet he has been carrying it out at least daily as new data become available. Thus it's safe to assume that his model is implemented as one or more do files, and the only change he has to make to the code each day is to tell it where to find the new data (and maybe not even that).

Few SSCC members need to rerun their analysis every day. But it's worth considering: how long would it take you to rerun your analysis if you were given new data? Even more common: how long would it take you to rerun your analysis if a reviewer/advisor/committee member asked you to add a new variable to a model or make some other relatively small methodological change? If your analysis is done using code (Stata do files, SAS programs, SPSS syntax files, R scripts, etc.) and those programs are well-organized and structured such that they can be run again at any time, then changes can be made in hours or even minutes. On the other hand, we often see researchers take days or weeks to make changes to their analysis because they struggle to understand and reproduce what they did before. Even if you never have to make changes, writing clear code that minimizes opportunities for human error can help you avoid mistakes.

Writing good code probably won't bring you an audience the size of Nate Silver's. But it will make your research more efficient and less stressful.