Document and organise data
Research data can be of different types and the purpose of the storage may vary, depending on which phase of the research process you are in. But all data needs to be structured and organised in a consistent way, and data needs to be supplemented with various types of descriptions (metadata). In addition, the storage needs to meet certain requirements, such as those applying to technology, ethics and accessibility.
Storing data in a system that meets the University’s requirements and providing them with metadata makes the use and publication of data easier and is a prerequisite for being able to manage research data according to what are known as the four FAIR principles (Findable, Accessible, Interoperable, Reusable).
What should be documented?
What is relevant to document may vary depending on the research subject and methods. But documentation should cover how data has been generated and also describe samples, methods, processes, source code and other tools so that the research process can be reproduced. Try to follow principles and standards within your discipline and existing documentation procedures, if any, at your department or in the research team. Think about what someone else (or yourself) needs to know to be able to find, understand, validate and analyze data.
Important to describe are among others:
- how data has been collected, created or modeled
- how different data files and versions are organized
- what changes are made between different versions of data
- the meaning of different codes, abbreviations, variable names etc.
- what legal, ethical and possible other restrictions limit how data is reused
Much of the overall information can be given in a project description with associated data management plan. During the project, the research process is then documented and changes in methods and data management are described.
In an early phase of the project, one should agree on principles for how files should be named, how they are organized and procedures for versioning data and associated components. Document the logic behind structures and naming. Well-defined procedures for organizing and documenting data also make it easier to regularly review and dilute materials that are no longer needed and should not be preserved in the long term.
Variables and column headers should be transparent and understandable. They should also be documented, for example in a README file or in a separate code book. Use units and designations that are standard in your discipline.
Uppsala University has a recommended directory structure that is adapted to the requirements of archiving and publicity that are imposed on universities as an authority. The catalogue structure is designed so that the material can be sorted according to what is to be archived and preserved permanently, what is thinning at a certain time after the end of the project and work material that can be thinning immediately after project end.
Metadata
In order to make research data understandable and reusable, they need to be described. Use, if possible, the metadata standards available in the relevant research area. Metadata and metadata standards are central to the so-called FAIR principles.
What are the different types of metadata?
- Descriptive: Information about the dataset’s content, which makes it possible to find data, how the data were produced and who conducted the study. Examples of descriptive data are subject area, keyword, method, author/person responsible and a permanent identification marking. Specifying authors is a prerequisite for the sharing of data that can lead to new collaborations and acquisition of qualifications.
- Administrative: Administrative metadata provides information about how data can be used. File formats, rights, licenses, copyrights and preservation requirements are examples of administrative data.
- Structural: Structural metadata describes how the data are organised so they can be used by others.
Selecting the format of the data
When you plan a project, you need to think through which format or formats of data you are going to use. Since research data is considered a public document, you also need to keep in mind the requirements for archiving to which the University is subject as a public authority.
When selecting a data format, consider the following:
- What format or formats have you and your colleagues used before?
- Are there any domain-specific standards?
- Is the software compatible with the systems provided by the University?
- How is the data to be analysed?
- How is the data to be archived and stored?
- Can you add metadata?
- Is the format suitable for sharing of data?
- Will the format also work with future systems? Does it work in all parts of the process with minimal need for conversion to other formats?