Publish data
Open data
Research data made openly available through data repositories becomes searchable, citable and gives the authors more opportunities for impact, merit and future collaborations. Publishing research data openly available also increases transparency and reliability of presented results. It also gives others the opportunity to find and reuse existing data in new research.
Restricted access
There may be legal and/or ethical reasons not to share or publish research data. This may apply to data containing personal data and other information that may be classified as confidential under the Public Access to Information and Secrecy Act (SFS 2009:400) for example, business secrets and copyrighted material.
Although some data cannot be directly shared, published or made publicly available, it can still be registered in data repositories and thereby be findable. You can also specify contact points for requesting accessing to the data. Projects that deal with sensitive personal data and plan to deposit data in repositories with controlled disclosure procedures should include that information in the application for ethical review.
Sharing and publishing data with personal information
On anonymisation and the risk of re-identification
Repositories and Identifiers
If possible choose an established subject-based data repository when publishing data. In the re3data.org register (Registry of research data repositories), you can search for data repositories in various subject areas and countries.
Examples of subject specific repositories
- CESSDA – social science.
- DARIAH – humanities.
- ELIXIR – life science. Here you will find Elixir’s list of recommended databases, e.g. BioStudies.
- FEGA Sweden - a repository with controlled disclosure procedures for sharing genomic data.
- HEPData –high-energy physics.
- NOMAD – material science.
- Pangea - geosciences and biological sciences.
- Swedish Biodiversity Data Infrastructure – biodiversity.
- SciLifeLab Data Repository – Life Sciences.
Examples of interdisciplinary repositories
- Swedish National Data Service (SND) - a certified repository
Data sets published in the SND Repository are reviewed by staff at the Uppsala University Research Data Support in dialogue with responsible researchers. The aim is to make published data more FAIR. - Zenodo - a repository operated by OpenAIRE and CERN, funded by the European Commission. Zenodo can be used for storing and publishing data sets, code and publications. Code created in Github can be published via Zenodo and be assigned a DOI-number.
- Figshare - a data repository run by the company Digital Science.
- Dryad - repository run by a network of universities, scholarly societies and publishers.
- Dataverse - a data repository at Harvard University.
More detailed descriptions of datasets can be made in a "data paper".
When a dataset is published in a repository it is usually assigned a unique and permanent identifier, a PID (persistent identifier). PIDs are machine-readable and, unlike unlike URL links, durable over time. A PID can lead to a landing page where the data is described, or to individual data files or documents.
Please note that when publishing a data set, specified creators and roles may be different from those specified in publications whose analysis and results are based on this data. For examples of roles see CRediT – Contributor Roles Taxonomy. FORCE11 and the Committee on Publication Ethics (COPE), provide recommendations on issues relating to copyright for data sets.
Licenses
To facilitate the reuse of published data, one can choose a license and specify conditions for reuse, for example, using Creative Commons licenses or Open Data Commons. The chosen license can inform about wethether modifications and commercial use are allowed, and how the data can be shared further.
DIGG, the Agency for Digital Government, has developed guidelines for open licenses and intellectual property (in Swedish). They recommend that data not subject to copyright or other intellectual property protection should be marked with PDM (Public Domain Mark) or CC0. Data that is subject to copyright should, according to DiGG, be assigned the CC-BY 4.0 license.
More about licensing data at Researchdata.se
Some links about licenses for code, computer programs, and databases:
- To choose a license for open source software, GitHub has created the tool Choose a license.
- Licensing Assistant – a tool to find and compare software licenses.
- Open Source Initiative lists licenses for software and has a FAQ on the subject.
- European Union's open source license
- Open Data Commons Open Database License (ODbL) – An open license for databases, from the Open Knowledge Foundation.