Data linkage

Who’s involved?

Three groups are involved in data linkage and access processes:

  • data custodians are the people who look after the data collections. Data custodians work within an organisation or agency (such as a government department) and are responsible for the secure collection, use and disclosure of data. Data custodians collect and store personal information (eg. name, address, date of birth) and content information (e.g. health information such as diagnosis and treatment details).
  • data linkers are the people who create Linkage IDs which allow data to be linked within and between data collections. Data linkers work in a Data Linkage Unit that is either within, or associated with, a government agency.
  • researchers are the people who use the data for the purpose of analysis and research. This is only possible after an extensive application process and approval by all relevant data custodians and Human Research Ethics Committee/s (HREC/s).
  • secure research environment manager: this person designs, implements, and maintains secure research environments providing researchers and data custodians a secure platform for the sharing and analysis of sensitive health data.

How is data linked?

Access to and use of linked datasets is a complex and strictly controlled process. Researchers undergo a stringent application process requiring approval from each data custodian plus at least one Human Research Ethics Committee to confirm their study is both valid and in the public interest.

Once a project is approved, the data custodians and staff at the Data Linkage Unit work together to determine which records are required for the study to ensure minimal information is provided to the researcher. Data linkers then prepare linkage keys (a random string of numbers and letters) for the project enabling the provision of a merged dataset to the researcher to analyse. See infographic for a simple overview.

A similar approach is used if a researcher wants to link research data (e.g. from a clinical trial, registry, or longitudinal study) to the linked data.

What are the benefits of data linkage?

  • Linking existing data is a relatively cheap alternative to conducting large scale longitudinal research studies/or clinical trials
  • It allows longitudinal research to be conducted using whole populations that would otherwise be too costly to collect using alternative methods
  • Provides information on whole populations that generate a more complete picture of the community than is possible using other research methods 
  • Use of whole of population data rather than small samples increases the validity of research 
  • Large numbers of people over a large temporal range allows for the study of rare diseases and rare outcomes
  • Linkage within and between data collections enables the study of familial links to health and wellbeing across generations
  • Linkage of pre-existing data is more efficient than collecting data prospectively which, for some studies, depending on the research question, could take decades
  • Allows research questions to be tested or developed on existing data, allowing improved targeting of funding for new research
  • The re-use of existing data increases the value for money for the community
  • Adds value to standalone data collections that may be more meaningful when linked together
  • Data linkage has decreased the need for researchers to view identifiable data. Data Linkage Units use identifiers for linkage but ensure the separation principle and appropriate data security measures are in place
  • Includes all eligible participants
  • Linkage of multiple data collections can identify data entry errors and other technical issues with the data, improving data quality
  • Can be used to measure the efficacy of treatments in a real-world setting, beyond the research environment

What can linked data be used for?

Many of our life experiences from the moment we are born until our death generate data that is collected and used for a range of purposes. For example, information is routinely collected when you go to school, visit a hospital, when you get married or divorced and when you have a baby. This data is collected by different organisations all over Australia.

Researchers, health professionals, government policy-makers and planners link this data to:

  • investigate the distribution, origin, associated conditions and outcomes of disease
  • real world evaluation of policies and services
  • assess the health and wellbeing of Australians across the life course
  • better identify issues of population health importance, plan services and interventions to address these issues
  • monitor and evaluate the effectiveness of drugs, devices services, treatments and interventions. 
  • study rare diseases and rare outcomes.

For examples of research conducted using linked data click here.

For more assistance on your data linkage journey, please contact Client Services of your jurisdictional Data Linkage Unit or the PHRN National Office.