Data as Scholarly Communication
Data is a valuable intellectual product that can help to enrich the global research community as well as benefiting your career and connections to other scholars. When publishing data as a form of open scholarly communications, we encourage you to keep the following considerations in mind:
Data should be FAIR (Findable, Accessible, Interoperable, Reproducible)
The FAIR Principles (that data should be Findable, Accessible, Interoperable and Reproducible) were developed to provide flexible guidance on best practices regarding digital assets. FAIR Principles are based on the 2016 document ‘FAIR Guiding Principles for scientific data management and stewardship’ originally published in Scientific Data.
- Findable: data and supplementary materials should have sufficiently detailed descriptive metadata as well as a unique and persistent identifier such as a digital object identifier (DOI).
- Accessible: the metadata and data should be understandable to both humans and machines, and data should be stored in a trusted repository.
- Interoperable: metadata should use a formal, shared language such a disciplinary taxonomy or broadly agreed-upon controlled vocabularies.
- Reusable: data and collections should have a clear usage license, such as a Creative Commons license and provide accurate information on provenance.
FAIR principles are particularly useful in ensuring that data and metadata are machine readable and that as computational systems increasingly help us to find, store, and analyze data, we as researchers are creating datasets that support that ecosystem. Find resources on how to GO FAIR online or contact UMD Libraries GIS and Data Services for more guidance.
Datasets should be properly described
Creating appropriate documentation and metadata to accompany your datasets is an important aspect of ensuring that it can be understood and interpreted by future users. Documentation, including "readme" files, will explain how your data was created, its context, the structure of the data and its contents, and any manipulations you have made. In addition to describing data, having metadata facilitates search and retrieval of the data when deposited in a data repository. Metadata is often captured using controlled standards or vocabularies. These may vary based on your repository or publishing venue or may be controlled by discipline.
Resources for data documentation and metadata creation:
- DataONE Best Practices: Provides examples of research and data documentation
- ICPSR Guide to Social Science Data Preparation and Archiving: Explains the Data Documentation Initiative (DDI) and provides a list of important metadata elements for the social sciences
- Guide to Writing "readme" Style Metadata (Cornell University): Best practice for creating readme files for data. Includes links to examples.
- Disciplinary Metadata (Digital Curation Center): Provides information and links to discipline-specific metadata standards, including biology, social sciences, physical sciences, and general research data.
Data needs to be preserved to provide long term access
Just like all scholarly communications in a digital research ecosystem, data can be lost if file formats obsolesce, platforms or digital storage locations are decommissioned or poorly maintained, or links and other documentation recording the locations of research objects are not updated. Choosing a trusted repository or stable storage solution is essential in ensuring that data are available long term.
- Never assume that work published online is permanent or safe
- Preserve copies of your work both locally and in cloud storage (replicated storage model) to ensure that personal archiving is as effective as possible
- Repositories with a CoreTrustSeal, including those that the University of Maryland belongs to as a member and our own institutional repository, DRUM, follow practices that ensure that platforms are well constructed and maintained for long-term access.