Return to site

Synthetic versus anonymised health data

· synthetic,anonymised,health,data,machine learning

Anonymisation techniques – either encrypting or removing personally identifiable information - enable the wider use of personal data. However, using anonymised data comes with the risks of re-identifying patients, and thus involves a strict ethical approval process.

Using synthetic health data helps to overcome these privacy and confidentiality issues.

Synthetic health data is generated to represent real patient data, using publicly available open data sources, like NHS England and Public Health England statistics. Using synthetic health data also means:

  • You are able to practise writing queries and defining policies prior to accessing anonymised data, which has a strict ethical approval process
  • It holds no personal information and cannot be traced back to any individual; therefore, the use of synthetic data reduces confidentiality and privacy issues
  • You are given another perspective, helping to define a project because the synthetic data acts as a testbed
  • Tracking through synthetic data and the assumptions used to generate the synthetic data may assist with literature discovery – identifying the cohort and impact of research. For instance, researchers doing clinical trials may generate synthetic data to aid in creating a baseline for future studies and testing.

Synthetic data can also be used in testing and creating different scenarios within a system, for example, illustrating the impact of a policy change, both at present-time and in the future.

Synthetic electronic health care records

A group of researchers in Massachusetts have developed Synthea: an approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. Based on publicly available information, the records can be simulated with pathways of disease progression, care plans, personal workflows (of the citizen and the healthcare professional), and the lifecycle of research projects.

broken image

Image: 'PADARSER as the conceptual framework for Synthea' <>

As the above image shows, the result is a source of synthetic electronic health records that are readily available; suited to industrial, innovation, research, and educational uses; and free of legal, privacy, security, and intellectual property restrictions.

By Allie Short