Guideline on the distinction between personal data and anonymised data – Disciplinary Domain of Medicine and Pharmacy
Guidance on the distinction between personal data and anonymised data
Sensitive personal data are personal data that reveal ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data that uniquely identify a natural person, data on health or data on a natural person’s sexual life or sexual orientation. Ethical clearance is required for processing sensitive personal data for research purposes. Data relating to deceased individuals are not considered to be personal data.
Anonymised data means information in which it has been rendered no longer possible to identify an individual, by removing or changing identifying information, for example.
The anonymisation is required to be irreversible, meaning that no key is saved for the identification of individual data. Furthermore, anonymisation of data means that the data has been anonymised in a way that makes it impossible or extremely impractical to identify the natural person. It must not be possible to identify individuals from data by combining different variables in the data or by combining these data with other data sources. Once an individual’s data has been anonymised, they are no longer deemed to constitute personal data and can be published openly.¹
The Disciplinary Domain of Medicine and Pharmacy aspires to make research data available through open science, but at what point can personal data be considered irreversibly non-identifiable and thus deemed to be anonymised? The line between personal data including sensitive personal data and anonymised data is not clearly defined. Guidance is needed.
For data to be considered anonymised, the Disciplinary Domain of Medicine and Pharmacy recommends that:
- Data must not include personal identity number, serial number, name, address, exact date of birth, exact date of diagnosis or an exact date of prescription of
- Dates must be rounded off to the nearest whole year or
- When data are organised with one individual per row of data, there must be at least 10 individuals per combination of variables when all variables are combined with each other.
- There must be no code.²
This guidance on anonymisation does not guarantee complete anonymity and there may still be a risk of a personal data breach if the information is combined with other available data. It is therefore important to document in each individual case the consideration given to such risks and how they are being managed. If there is any uncertainty about whether anonymity has been achieved, additional safeguards are required. This can be implemented through generalisation, for example by replacing age with a five-year class, or through data modification, which may lead to distortion of data.
¹ Anonymisation means that personal data are processed in such a way that the person can no longer be identified from them. For example, the data may be generalised (aggregated) or changed into statistical form so that data about an individual are no longer in identifiable form. Identification must be prevented in an irreversible manner and in such a way that the controller or a third party can no longer change data it holds back into identifiable form. Anonymised data are no longer deemed to constitute personal data. Data protection provisions do not apply to them.
² The coding of personal data is pseudonymisation. Pseudonymised data are still personal data and data protection provisions must be complied with when processing them.