Guide to Data Citation
Data are an accepted citable resource in academic and research enterprises. You should cite any data used as a primary or secondary source. Additionally, you may be encouraged to cite specialized software, instruments, or tools used to collect original data. Reference entries are not usually necessary for standard software and programming languages. Some software will provide recommended reference entries, especially in the case of a supporting publication.
Citing Data: Rationale
- Ensures provenance, attribution, and credit
- Supports persistence and stability of data
- Connects publications to the specific version of data used
- Promotes access, re-use, and collaboration
- Encourages use of descriptive metadata standards for data
When preparing your own data for deposit into a repository, it is good practice to include a recommended data citation. Some repositories, however, will ask for metadata elements instead, and will derive a recommended citation from those elements.
Data Citation Elements
The International Association for Social Science Information Services & Technology (IASSIST) have provided the following core elements of data citation:
- Author: Names of each entity responsible for the creation of the dataset
- Date of Publication: Publication or dissemination date
- Title: Complete title, including edition or version number
- Publisher/Distributor: Organizational entity that makes the dataset available
- Electronic location: Web address or unique/persistent/global identifier
Tip: Whenever applicable, always use a DOI or Handle in place of a URL.
The APA (6th) recommends the following as a template for dataset, software, instrument, and apparatus citations:
Author/Rightsholder. (Year). Title of program/application. (Version number) [Description of form]. Location: Name of producer.
Author/Rightsholder. (Year). Title of program/application. [Descsription of form]. Retrieved from http://xxxx
Data Citation Examples
Here are a few examples from several style guides to get you started.
Pew Hispanic Center. (2004). Changing channels and crisscrossing cultures: A survey of Latinos on the news media [Data file and code book]. Retreived from http://pewhispanic.org/datasets/
Friedlander, M. L., Escudero, V., & Heatherington, L. (2002). E-SOFTA: System for observing family therapy alliances [Software and training videos]. Unpublished instrument. Retrieved from http://www.softa-soatif.com/
Comprehensive Meta-Analysis (Version 2) [Computer software]. Englewood, NJ: Biostat. Apparatus: Eyelink II [Apparatus and software]. (2004). Mississauga, Canada: SR Research.
Machine-readable data files:
American Institute of Public Opinion. 1976. Gallup Public Opinion Poll # 965 [MRDF]. Princeton, NJ: American Institute of Public Opinion [producer]. New Haven, CT: Roper Public Opinion Research Center, Yale University [distributor].
U.S. Bureau of the Census. 1970. Census of Population and Housing 1970, Summary Statistic File 4H: U.S. [MRDF]. DUALabs ed. Washington, DC: U.S. Bureau of the Census [producer]. Rosslyn, VA: Data Use and Access Laboratories (DUALabs) [distributor].
Scientists and Engineers Statsitical Data System (SESTAT). 2006. "Table B-1: U.S. Scientists and Engineers, by Detailed Field and Level of Highest Desgree Attained: 1999." Retrieved December 12, 2006 (http://srsstats.sbe.nsf.gov/preformatted-tables/1999/tables/TableB1.pdf).
U.S. Bureau of the Census. 1999. "1999 Survey of Doctorate Recipients." Washington, DC: U.S. Department of Commerce. Retrieved December 12, 2006 (http://srsstats.sbe.nsf.gov/docs/sdr99.pdf).
The Chicago Manual of Style, 16th ed. does not have formal reference guidelines for datasets. However, if you can identify the elements of data citation (author, title, publisher/distributor, and electronic location/unique resource identifer), then you can construct a sufficient reference entry.
Author Last, First. Year. Title of Dataset. File name or identifier. Location of Publisher: Publisher. Location of Distributor: Distributor. DOI/URI.
Smith, Tom W., Peter V. Marsden, and Michael Hout. 2011. General Social Survey, 1972-2010 Cumulative File. ICPSR31521-v1. Chicago, IL: National Opinion Research Center. Distributed by Ann Arbor, MI: Inter-university Consortium for Political and Social Research. doi:10.3886/ICPSR31521.v1.
The Modern Language Association, 8th ed. does not have formal reference guidelines for datasets. An appropriate reference entry can be constructed using the MLA core elements.
Datasets (constructed using the MLA core elements):
Smith, Tom W., Peter V. Marsden, and Michael Hout. General Social Survey, 1972-2010 Cumulative File. ICPSR31521-v1. Chicago, IL: National Opinion Research Center [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011. Web. 23 Jan 2012. doi:10.3886/ICPSR31521.v1.
MLA does, however, have formal guidelines for citing graphically represented data.
Goldberg, David, et al. Enrollments in Languages Other Than English in United States Institutions of Higher Education, Fall 2013. Modern Language Association, Feb 2015, www.mla.org/enrollments_census.