The Availability of Research Data Declines Rapidly with Article

New peer reviewed paper in ‘Current Biology’ (December 2013) shows that there is a strong link between the age of a science article and the availability of data, based on 516 studies.

Lead Author, Timothy H. Vines of ‘The Availability of Research Data Declines Rapidly with Article Age,’ describes how the major cause of the reduced data availability for older papers is the rapid increase in the proportion of data sets reported as either lost or on inaccessible storage media.

Main points:data loss

  • We examined the availability of data from 516 studies between 2 and 22 years old

  • The odds of a data set being reported as extant fell by 17{154653b9ea5f83bbbf00f55de12e21cba2da5b4b158a426ee0e27ae0c1b44117} per year

  • Broken e-mails and obsolete storage devices were the main obstacles to data sharing

  • Policies mandating data archiving at publication are clearly needed

Vines show that author responses included authors admitting their data was lost, stolen, stored in some distant or held on outmoded technology (e.g. floppy disk). Authors reported reluctance to retrieve such data. In two cases, authors complained that they would have to devote hours or days to retrieving the data.

Vines reports, “Our reason for needing the data (a reproducibility study) was not especially compelling for authors, and we may have received more of these inaccessible data sets if we had offered authorship on the subsequent paper or said that the data were needed for an important medical or conservation project. The odds that we were able to find an apparently working e-mail address (either in the paper or by searching online) for any of the contacted authors did decrease by about 7{154653b9ea5f83bbbf00f55de12e21cba2da5b4b158a426ee0e27ae0c1b44117} per year.”

Vines continues “the proportion of e-mails from the paper that appeared to work declined with article age between 2 and 14 years of age and then rose to around 80{154653b9ea5f83bbbf00f55de12e21cba2da5b4b158a426ee0e27ae0c1b44117} for articles from 1991, 1993, and 1995.”

This revealing study adds:

Many data sets produced in scientific research are unique to their time and location, and once lost they cannot be replaced [14]. Since it is impossible to know what uses would have been found for these data or when they would become important, leaving their preservation to authors denies future researchers any chance of reusing them.

Fortunately, one effective solution is to require that authors share it on a public archive at publication: the data will be preserved in perpetuity and can no longer be withheld or lost by authors. Some journals have already enacted policies to this effect (e.g., [5, 6]), and we hope that the worrying magnitude of the issues reported here will encourage others to draft similar policies in due course.

Read more at: Current Biology.

Trackback from your site.

Leave a comment

Save my name, email, and website in this browser for the next time I comment.
Share via