MPSA-Blog_ReplicationRobust

The meaning of the wordreplication hardly seems like the sort of thing that would land a person in court. Yet, it did. In Lott v. Levitt (2009), the U.S. District Court of Northern Illinois ruled on that very question, in a dispute between two academic authors of bestselling books.

In his book Freakonomics, Steven Levitt argued that “other researchers have failed to replicate” John Lott’s work, published in the latter’s book More Guns, Less Crime. In fact, as Lott later pointed out, he is willing to share his data, and when other researchers run the same statistical models using the same data, they successfully replicate the results. Lott did not falsify his data, fabricate results, or make errors in his reporting. This is what replication is meant to check, and Lott’s research passed the test.

Levitt countered by arguing that the word “replicatecan have a broader meaning. The courts agreed and also noted that generally, the judicial system should stay out of such disputes unless the use of the word is particularly egregious and serves to defame the target, which was not the case here. (A second part of the suit was settled in Lott’s favor, but this was unrelated to the argument over the exact meaning of replication).

I recently blundered into this whole controversy. I published a newspaper column referencing Lott’s research, and I, too, suggested that others could not replicate his research. Lott spotted the column and wrote me a friendly note offering to share some of his more recent research on Australia’s gun laws with me. I read it and found it fascinating. However, he also asked for a retraction of the replication charge. I demurred, because while I would not use the word “replication” again in this particular context, I think the way I used it is defensible, as per the Lott v. Levitt ruling and the interpretation offered in the above, hyperlinked article from Scientific American. What I had in mind were other studies, using different data, methods, and time periods, that reach different conclusions than does Lott’s book—replication in a broader sense.

Lott and Levitt are both economists, but political scientists have had our own struggles with replication. In the 1990s, Harvard Professor Gary King started a movement to push for replication in quantitative political science, but the standards for which he advocated have never become universal. King did have some success in getting participating political scientists to make their data more accessible, but not everyone is game. Enormous amounts of time and, at times, money go into data-collection, and many researchers consider their datasets proprietary. Furthermore, academic journals rely on unpaid, volunteer reviewers, who layer the responsibilities on top of their other duties as professors and researchers. With few exceptions (including the AJPS which has a third-party replication process), journals must rely on reviewers to download massive datasets into statistical programs like SPSS and R, then replicate exactly what other researchers have done. This approach is probably not realistic; editors have a hard time just getting them to complete their reviews, which are sometimes months late and only a few sentences long. By contrast, book reviewers are often paid, but book publishers increasingly look, not for the kind of detailed, technical matters involved in replication, but rather for books that will reach a broader audience. Selling books only to other political science professors is not a very lucrative market. While Lott and Levitt both have their critics (including one another), both wrote bestselling books, and this is what publishers want. They are not likely to get involved in a lengthy replication project, when what they are seeking are readability and larger audiences: the next More Guns, Less Crime or Freakonomics—and they do not much care which, as long as it sells. In short, publishers cannot be relied upon to enforce standards of replication, nor can editors, and the courts would prefer to stay out of it.

One way out of this mess is to invoke another statistical concept: robustness. Just as replication can have a narrow or broad meaning, robustness can as well. While the word always makes me think of my morning coffee, or perhaps a good merlot, robustness in the statistical sense refers to a relationship between two variables that is not driven by just a few cases or assumptions. At the risk of oversimplifying: if a few seemingly minor alterations in a data analysis result in a change in the results, then those results were not robust in the first place. For a better, more technical explanation, visit http://www.rci.rutgers.edu/~dtyler/ShortCourse.pdf.

Like replication, robustness can also be defined more broadly. Much as the results with a single dataset are robust if they hold across all (or least many) of the cases and not just a few, so research results can be said to be robust if the same finding keeps popping up in multiple studies, using different data and different ways of modeling it. Findings such as the relationship between education and political participation (those with more education are more likely to vote and to participate in other ways) hold up, no matter how you slice the data. Old data, new data, crude models, highly sophisticated analyses—again and again, the relationship appears. There is just no way around the fact that more education often pairs with more political involvement. Of course, a few individuals exist who do not fit this pattern, but these exceptions do not debunk the claim. It is solid. Within a single dataset, robustness refers to the relationship holding across a broad swath of cases. Considered more broadly, a finding can be said to be robust if it holds up across a broad swath of studies.

Conversely, the research on concealed-carry, gun ownership, and crime is not robust, in the broad sense that I am using it. Lott’s research finds that concealed-carry laws mean more crime deterrence. The research from Aneja, Donohue, and Zhang, referenced in the piece from The Washington Post above, shows that such laws increase crime—specifically aggravated assault—while having no impact on other crime rates. For his part, Levitt believes that there is little relationship at all between gun ownership and crime rates. In short, the research on this topic is highly sensitive to model specification, time periods covered, and data used. No clear, robust relationships have emerged that carry across different research by different researchers using different data and different modeling techniques, to establish clear conclusions. The most likely explanation for this, is that the relationship between gun ownership and crime is ambiguous. Whether positive or negative, the effects are small compared to the big drivers such as the percentage of poor, unemployed, young males—who commit the vast majority of street crimes, regardless of race—that exist in the population at any given time…

…which is exactly what I wrote in that newspaper column. But, by using the word “replication” loosely, I wandered into a whole new set of questions—ones which are not easily resolved, and ultimately go to the soul of social science itself.

About the author: Michael A. Smith is a Professor of Political Science at Emporia State University where he teaches classes on state and local politics, campaigns and elections, political philosophy, legislative politics, and nonprofit management. Read more on the MPSA blog from Smith and follow him on Twitter.