Data publishing (also data publication) is the act of releasing data in published form for (re)use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available to everyone to use as they wish. This practice is an integral part of the open science movement. There is a large and multidisciplinary consensus on the benefits resulting from this practice.
However, publishers supported data publishing/publication either as an integral part of the paper or as supplemental material published jointly with the paper. These approaches are affected from a number of drawbacks from the data publication perspective including the difficulties in separating the data from the rest.
Data publishing/publication is a practice on its own:
- A number of data journals have been developed to support data publication.
- A number of repositories have been developed to support data publication, e.g. figshare, Dryad, Dataverse. A survey on how generalist repositories are supporting data publishing is available
Data papers are “scholarly publication of a searchable metadata document describing a particular on-line accessible dataset, or a group of datasets, published in accordance to the standard academic practices”. Their final aim being to provide “information on the what, where, why, how and who of the data”. The intent of a data paper is to offer descriptive information on the related dataset(s) focusing on data collection, distinguishing features, access and potential reuse rather than on data processing and analysis. Because data papers are considered academic publications no different than other types of papers they allow scientists sharing data to receive credit in currency recognizable within the academic system, thus "making data sharing count". This provides not only an additional incentive to share data, but also through the peer review process, increases the quality of metadata and thus reusability of the shared data.
Despite their potentiality, data papers are not the ultimate and complete solution for all the data sharing and reuse issues and, in some cases, they are considered to induce false expectations in the research community.
Data papers are supported by a rich array of journals, some of which are "pure", i.e. they are dedicated to publish data papers only, while others – the majority – are "mixed", i.e. they publish a number of articles types including data papers.
Examples of "pure" data journals are: Earth System Science Data, Scientific Data, Journal of Open Archaeology Data, and Open Health Data.
Data citation is the provision of accurate, consistent and standardised referencing for datasets just as bibliographic citations are provided for other published sources like research articles or monographs. Typically the well established Digital Object Identifier (DOI) approach is used with DOIs taking users to a website that contains the metadata on the dataset and the dataset itself.
Several organizations have been established with the aim of driving the data citation agenda. These include the following:
- CODATA Data Citation Standards and Practices Task Group
- Data Preservation Alliance for the Social Sciences (Data-PASS)
- Data Citation Working Group of the Research Data Alliance
Data citation is an emerging topic in computer science and it has been defined as a computational problem. Indeed, citing data poses significant challenges to computer scientists and the main problems to address are related to:
- the use of heterogeneous data models and formats – e.g., relational databases, Comma-Separated Values (CSV), eXtensible Markup Language (XML), Resource Description Framework (RDF);
- the transience of data;
- the necessity to cite data at different levels of coarseness – i.e., deep citations;
- the necessity to automatically generate citations to data with variable granularity.
- Costello MJ (2009). "Motivating online publication of data". BioScience. 59 (5): 418–427. doi:10.1525/bio.2009.59.5.9.
- Smith VS (2009). "Data publication: towards a database of everything". BMC Research Notes. 2 (113). doi:10.1186/1756-0500-2-113. PMC 2702265. PMID 19552813.
- Lawrence, B; Jones, C.; Matthews, B.; Pepler, S.; Callaghan, S. (2011). "Citation and Peer Review of Data: Moving Towards Formal Data Publication". International Journal of Digital Curation. 6 (2): 4–37. doi:10.2218/ijdc.v6i2.205.
- Callaghan, S., Donegan, S., Pepler, S., Thorley, M., Cunningham, N., Kirsch, P., Ault, L., Bell, P., Bowie, R., Leadbetter, A., Lowry, R., Moncoiffé, G., Harrison, K., Smith-Haddon, B., Weatherby, A., & Wright, D. (2012). "Making data a first class scientific output: Data citation and publication by NERCs environmental data centres". International Journal of Digital Curation. 7 (1): 107–113. doi:10.2218/ijdc.v7i1.218.
- Kratz J, Strasser C (2014). "Data publication consensus and controversies". F1000Research. 3 (94). doi:10.12688/f1000research.4518.
- Assante, M.; Candela, L.; Castelli, D.; Tani, A. (2016). "Are Scientific Data Repositories Coping with Research Data Publishing?". Data Science Journal. 15. doi:10.5334/dsj-2016-006/.
- Chavan, V. & Penev, L. (2011). "The data paper: a mechanism to incentivize data publishing in biodiversity science". BMC Bioinformatics. 12 (15). doi:10.1186/1471-2105-12-S15-S2.
- Newman Paul; Corke Peter (2009). "Data papers — peer reviewed publication of high quality data sets". International Journal of Robotics Research. 28 (5): 587–587. doi:10.1177/0278364909104283.
- Gorgolewski KJ, Margulies DS, Milham MP (2013). "Making data sharing count: a publication-based solution". Frontiers in Neuroscience. 7. doi:10.3389/fnins.2013.00009.
- Parsons, M.A.; Fox, P.A. (2013). "Is data publication the right metaphor?". Data Science Journal. 12: WDS31–WDS46.
- Candela, L., Castelli, D., Manghi, P. and Tani, A. (2015). "Data Journals: A Survey". Journal of the Association for Information Science and Technology. 66 (1): 1747–1762. doi:10.1002/asi.23358.
- Australian National Data Service: Data Citation Awareness (Accessed 20 March 2012)
- Ball, A., Duke, M. (2011). ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/briefing-papers/
- Data Citation Principles Workshop, May 16 - May 17, 2011, IQSS at Harvard University: Links (Accessed 20 March 2012)
- Buneman, P., Davidson, S. and Frey, J. (2016). ‘Why data citation is a computational problem’. Communication of the ACM, To appear in September 2016. Available online: http://frew.eri.ucsb.edu/private/preprints/bdf-cacm-data-citation.pdf
- Silvello, G. and Ferro, N. (2016). ‘Data Citation is Coming. Introduction to the Special Issue on Data Citation’. Bulletin of IEEE Technical Committee on Digital Libraries, Volume 12 Issue 1, May 2016. Available online: http://www.ieee-tcdl.org/Bulletin/current/papers/intro.pdf
- Buneman, P. and Silvello, G. (2010). ‘A Rule-Based Citation System for Structured and Evolving Datasets’. IEEE Bulletin of the Technical Committee on Data Engineering , Vol. 3, No. 3. IEEE Computer Society, pp. 33-41, September 2010. Available online: http://sites.computer.org/debull/A10sept/buneman.pdf
- Silvello, G. (2016). ‘Learning to Cite Framework: How to Automatically Construct Citations for Hierarchical Data’. Journal of the Association for Information Science and Technology (JASIST), to appear, 2016. Pre-print available online: http://www.dei.unipd.it/~silvello/papers/2016-DataCitation-JASIST-Silvello.pdf
- Silvello, G. (2015). ‘A Methodology for Citing Linked Open Data Subsets’. D-Lib Magazine 21 (1/2), 2015. Available online: http://www.dlib.org/dlib/january15/silvello/01silvello.html
- Buneman, P. (2006). ‘How to Cite Curated Databases and how to Make Them Citable’. In Proc. of the 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006, pages 195–203, 2006.