18 October 2014: An important update is below the original post.
— Emilio M. Bruna (@BrunaLab) October 16, 2014
You don’t need me to rehash the arguments for why scientists should archive and make publicly available the data and code used in their papers — you can read up on those here, here, and here, for starters. As noted by Roche et al., many scientists fail to see the personal benefits to archiving given the potential costs, not the least of which are the time, effort, and money required to do so. To counter these concerns, proponents of public data archiving — including me — are quick to point out the many advantages of doing so: the opportunity for novel collaborations, the public good, meeting funding agency mandates, and what
matters most often counts most for scientists in terms of professional advancement and recognition — citations of the archived data and code. To put it bluntly, in our annual evaluations and portfolios for tenure or promotion all of us are asked us to document the “impact of our research program on the field” (e.g., flip to the bottom of p. 16). Citations of datasets and code are an easy and unambiguous way to do so, which is why this has been put forward as an important mechanism by which to increase the number of scientists engaged in open science (see also Whitlock 2011 [sorry, paywalled]). There are plenty of handy guides on how to do so — the DCC has a comprehensive one, as does The Dryad Digital Repository. It’s preety simple, here is Dryad’s suggestion for how to cite data (emphasis theirs):
When citing data found in Dryad, please cite both the original article as well as the Dryad data package. It is recommended that the data package be cited in the bibliography of the original publication so that the link between the publication and data is indexed by third party services. Dryad provides a generic citation string that includes authors, year, title, repository name and the Digital Object Identifier (DOI) of the data package, e.g. Westbrook JW, Kitajima K, Burleigh JG, Kress WJ, Erickson DL, Wright SJ (2011) Data from: What makes a leaf tough? Patterns of correlated evolution between leaf toughness traits and demographic rates among 197 shade-tolerant woody species in a neotropical forest. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.8525
Where am I going with this? I recently had a paper accepted as a Report in in Ecology, and I decided to put my data where my mouth was and archive all our data with Dryad, post the code at GitHub, and get a DOI for the code with Zenodo so it would be citable. This paper also used some of my lab’s data from a prior publication that we had also archived at Dryad, so I followed Dryad’s guidleines and included citations to both the paper and dataset in the Literature Cited. Open Science FTW! Imagine my surprise when I got this email from the Copy Editor preparing to send my manuscript to the printers:
Regarding your inclusion of references to code and data in Literature Cited and elsewhere in the text, we have made the following changes to fit our style: 1) Bruna et al. 2011b has been deleted from Literature Cited. In the one place where it was cited in the text, it now says “see data associated with Bruna et al. 2011”. [EB note: Bruna et al. 2011b is the citation of the dataset archived at Dryad] 2) Bruna 2014 has been deleted from Literature Cited and the URL appears as a footnote where this R code was referenced in the text [EB note: Bruna 2014 is the citation to the code with the DOI] 3) The URL for the data associated with this manuscript now appears in a Data Availability statement at the end of the paper A reference to that appears where the URL was formerly in the paper. [EB note: this refers to Dryad archive of the data collected for this paper]
Indexing services don’t scrape the appendices, data availability statements, or footnotes of papers. In other words: Sorry, Emilio…no citation credit for you! But thanks for being a mensch and archiving your data and code, which by the way we also think is really important. ESA, this isn’t right. You can’t encourage authors to archive data/code — or in the case of Ecological Applications, another journal you publish, actually require it — but take away a major incentive and reward for doing so. But don’t take my word for it: Liza Lester said it best in the February announcement that Ecological Applications would require data arching:
UPDATE (18 October 2014): Resolution! This flew around the twitterverse for 48 hours and has been resolved thanks to Todd Vision from Dryad and J. David Baldwin, the Managing Editor of ESA Publications. The tweet from Ecology EIC Don Strong says it all:
Thanks to @tjvision, @cboetting, @recology, @tjvision, @mfenner, @ethanwhite, @hormiga, @AntLabUCF, and the others retweeting and commenting for their help in resolving this.
PS Of course leave it to Don Strong to use this as an opportunity to mock us for the rough couple of years the Gator football team has had…
— StrDon (@ElDon78) October 18, 2014
Thanks Don. Thanks alot.
Photo credit: Facepalm, by Brandon Grasley (CC BY 2.0)