Research data archiving

Lua error in package.lua at line 80: module 'strict' not found. Lua error in package.lua at line 80: module 'strict' not found. Research data archiving is the long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences. The various academic journals have differing policies regarding how much of their data and methods researchers are required to store in a public archive, and what is actually archived varies widely between different disciplines. Similarly, the major grant-giving institutions have varying attitudes towards public archival of data. In general, the tradition of science has been for publications to contain sufficient information to allow fellow researchers to replicate and therefore test the research. In recent years this approach has become increasingly strained as research in some areas depends on large datasets which cannot easily be replicated independently.

Data archiving is more important in some fields than others. In a few fields, all of the data necessary to replicate the work is already available in the journal article. In drug development, a great deal of data is generated and must be archived so researchers can verify that the reports the drug companies publish accurately reflect the data.

The requirement of data archiving is a recent development in the history of science. It was made possible by advances in information technology allowing large amounts of data to be stored and accessed from central locations. For example, the American Geophysical Union (AGU) adopted their first policy on data archiving in 1993, about three years after the beginning of the WWW.^[1] This policy mandates that datasets cited in AGU papers must be archived by a recognised data center; it permits the creation of "data papers"; and it establishes AGU's role in maintaining data archives. But it makes no requirements on paper authors to archive their data.

Prior to organized data archiving, researchers wanting to evaluate or replicate a paper would have to request data and methods information from the author. The academic community expects authors to share supplemental data. This process was recognized as wasteful of time and energy and obtained mixed results. Information could become lost or corrupted over the years. In some cases, authors simply refuse to provide the information.

The need for data archiving and due diligence is greatly increased when the research deals with health issues or public policy formation.^[2]^[3]

Selected policies by journals

The American Naturalist

The American Naturalist requires authors to deposit the data associated with accepted papers in a public archive. For gene sequence data and phylogenetic trees, deposition in GenBank or TreeBASE, respectively, is required. There are many possible archives that may suit a particular data set, including the Dryad repository for ecological and evolutionary biology data. All accession numbers for GenBank, TreeBASE, and Dryad must be included in accepted manuscripts before they go to Production. If the data is deposited somewhere else, please provide a link. If the data is culled from published literature, please deposit the collated data in Dryad for the convenience of your readers. Any impediments to data sharing should be brought to the attention of the editors at the time of submission so that appropriate arrangements can be worked out.

— JSTOR^[4]

Journal of Heredity

The primary data underlying the conclusions of an article are critical to the verifiability and transparency of the scientific enterprise, and should be preserved in usable form for decades in the future. For this reason, Journal of Heredity requires that newly reported nucleotide or amino acid sequences, and structural coordinates, be submitted to appropriate public databases (e.g., GenBank; the EMBL Nucleotide Sequence Database; DNA Database of Japan; the Protein Data Bank ; and Swiss-Prot). Accession numbers must be included in the final version of the manuscript. For other forms of data (e.g., microsatellite genotypes, linkage maps, images), the Journal endorses the principles of the Joint Data Archiving Policy (JDAP) in encouraging all authors to archive primary datasets in an appropriate public archive, such as Dryad, TreeBASE, or the Knowledge Network for Biocomplexity. Authors are encouraged to make data publicly available at time of publication or, if the technology of the archive allows, opt to embargo access to the data for a period up to a year after publication. The American Genetic Association also recognizes the vast investment of individual researchers in generating and curating large datasets. Consequently, we recommend that this investment be respected in secondary analyses or meta-analyses in a gracious collaborative spirit.

— oxfordjournals.org^[5]

Molecular Ecology

Molecular Ecology expects that data supporting the results in the paper should be archived in an appropriate public archive, such as GenBank, Gene Expression Omnibus, TreeBASE, Dryad, the Knowledge Network for Biocomplexity, your own institutional or funder repository, or as Supporting Information on the Molecular Ecology web site. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.

— Wiley^[6]

Nature

Such material must be hosted on an accredited independent site (URL and accession numbers to be provided by the author), or sent to the Nature journal at submission, either uploaded via the journal's online submission service, or if the files are too large or in an unsuitable format for this purpose, on CD/DVD (five copies). Such material cannot solely be hosted on an author's personal or institutional web site.^[7] Nature requires the reviewer to determine if all of the supplementary data and methods have been archived. The policy advises reviewers to consider several questions, including: "Should the authors be asked to provide supplementary methods or data to accompany the paper online? (Such data might include source code for modelling studies, detailed experimental protocols or mathematical derivations.)

— Nature^[8]

Science

Science supports the efforts of databases that aggregate published data for the use of the scientific community. Therefore, before publication, large data sets (including microarray data, protein or DNA sequences, and atomic coordinates or electron microscopy maps for macromolecular structures) must be deposited in an approved database and an accession number provided for inclusion in the published paper.^[9] "Materials and methods" – Science now requests that, in general, authors place the bulk of their description of materials and methods online as supporting material, providing only as much methods description in the print manuscript as is necessary to follow the logic of the text. (Obviously, this restriction will not apply if the paper is fundamentally a study of a new method or technique.)

— Science^[10]

Royal Society Publishing

As a condition of acceptance authors agree to honour any reasonable request by other researchers for materials, methods, or data necessary to verify the conclusion of the article. Supplementary data up to 10Mb is placed on the Society's website free of charge and is publicly accessible. Large datasets must be deposited in a recognised public domain database by the author prior to submission. The accession number should be provided for inclusion in the published article.

— ^{[citation needed]}

Policies by funding agencies

In the United States, the National Science Foundation (NSF) has tightened requirements on data archiving. Researchers seeking funding from NSF are now required to file a data management plan as a two-page supplement to the grant application.^[11]

The NSF Datanet initiative has resulted in funding of the Data Observation Network for Earth (DataONE) project, which will provide scientific data archiving for ecological and environmental data produced by scientists worldwide. DataONE's stated goal is to preserve and provide access to multi-scale, multi-discipline, and multi-national data. The community of users for DataONE includes scientists, ecosystem managers, policy makers, students, educators, and the public.

Data archives

Natural sciences

The following list refers to scientific data archives.

Social sciences

Lua error in package.lua at line 80: module 'strict' not found.

Data archives are professional institutions for the acquisition, preparation, preservation, and dissemination of social and behavioral data. The term is also sometimes used about natural science institutions (e.g., CISL Research Data Archive, see Scientific data archiving and Borgman, 2007, p. 18^[12]), but here seems data centers to be the most used term. Data archives in the social sciences evolved in the 1950s and has been perceived as an international movement:

By 1964 the International Social Science Council (ISSC) had sponsored a second conference on Social Science Data Archives and had a standing Committee on Social Science Data, both of which stimulated the data archives movement. By the beginning of the twenty-first century, most developed countries and some developing countries had organized formal and well-functioning national data archives. In addition, college and university campuses often have `data libraries' that make data available to their faculty, staff, and students; most of these bear minimal archival responsibility, relying for that function on a national institution (Rockwell, 2001, p. 3227).^[13]

re3data.org is a global registry of research data repository indexing data archives from all disciplines: http://www.re3data.org
CESSDA Members are data archives and other organisations that archive social science data and provide data for secondary use: http://www.cessda.net/about/members.html
Consortium of European Social Science Data Archives: http://www.cessda.org/
The Danish Data Archives: http://www.sa.dk/content/us/about_us ; specific page (only in Danish): http://www.sa.dk/dda/default.htm
Inter-university Consortium for Political and Social Research: http://www.icpsr.umich.edu/
The Roper Center for Public Opinion Research: http://www.ropercenter.uconn.edu
The Social Science Data Archive: http://dataarchives.ss.ucla.edu/
The NCAR Research Data Archive: http://rda.ucar.edu

Life sciences

Lua error in package.lua at line 80: module 'strict' not found.

References

↑ ”Policy on Referencing Data in and Archiving Data for AGU Publications” [1]
↑ "The Case for Due Diligence When Empirical Research is Used in Policy Formation" by Bruce McCullough and Ross McKitrick. [2]
↑ "Data Sharing and Replication" a website by Gary King
↑ Supporting Data and Material
↑ Data archiving policy
↑ Policy on data archiving
↑ "Availability of Data and Materials: The Policy of Nature Magazine
↑ Lua error in package.lua at line 80: module 'strict' not found.
↑ "General Policies of Science Magazine"
↑ ”Preparing Your Supporting Online Material”
↑ ”NSF to Ask Every Grant Applicant for Data Management Plan”
↑ Borgman, Christine L. (2007).Scholarship in the digital age: information, infrastructure and the internet. Cambridge, MA: The MIT Press.
↑ Rockwell, R. C. (2001). Data Archives: International. IN: Smelser, N. J. & Baltes, P. B. (eds.) International Encyclopedia of the Social and Behavioral Sciences (vol. 5, pp. 3225- 3230). Amsterdam: Elsevier

External links

Registry of Research Data Repositories re3data.org [4]
Statistical checklist required by Nature [5]
Policies of Proceedings of the National Academy of Sciences (U.S.) [6]
The US National Committee for CODATA [7]
The Role of Data and Program Code Archives in the Future of Economic Research [8]
Data sharing and replication – Gary King website [9]
The Case for Due Diligence When Empirical Research is Used in Policy Formation by McCullough and McKitrick [10]
Thoughts on Refereed Journal Publication by Chuck Doswell [11]
“How to encourage the right behaviour” An opinion piece published in Nature, March, 2002.[12]
NASA Astrophysics Data System [13]
Panton Principles for Open Data in Science, at Citizendium [14]
Inter-university Consortium for Political and Social Research [15]

[1] ”Policy on Referencing Data in and Archiving Data for AGU Publications” [1]

[2] "The Case for Due Diligence When Empirical Research is Used in Policy Formation" by Bruce McCullough and Ross McKitrick. [2]

[3] "Data Sharing and Replication" a website by Gary King

[4] Supporting Data and Material

[5] Data archiving policy

[6] Policy on data archiving

[7] "Availability of Data and Materials: The Policy of Nature Magazine

[8] Lua error in package.lua at line 80: module 'strict' not found.

[9] "General Policies of Science Magazine"

[10] ”Preparing Your Supporting Online Material”

[11] ”NSF to Ask Every Grant Applicant for Data Management Plan”

[12] Borgman, Christine L. (2007).Scholarship in the digital age: information, infrastructure and the internet. Cambridge, MA: The MIT Press.

[13] Rockwell, R. C. (2001). Data Archives: International. IN: Smelser, N. J. & Baltes, P. B. (eds.) International Encyclopedia of the Social and Behavioral Sciences (vol. 5, pp. 3225- 3230). Amsterdam: Elsevier

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Research data archiving

Contents

Selected policies by journals

The American Naturalist

Journal of Heredity

Molecular Ecology

Nature

Science

Royal Society Publishing

Policies by funding agencies

Data archives

Natural sciences

Social sciences

Life sciences

See also

References

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools