"How long will it take to get data?" is a common question that is asked of all of the data linkage units within the PHRN. Whilst we there is no single answer to this question, there are some common factors that influence how long it takes to get data. The following section looks at some these factors under the broad categories of ‘Type of data requested’ and ‘Project approvals’.
Type of data requested
One of the significant factors that influence the amount of time it takes to get data is the type of data requested by the researcher. There are a number of different combinations of data that can be requested, all of which have different levels of complexity and implications for data delivery timeframes.
- Core data sets are those data collections that are routinely linked by data linkage units and are part of the master linkage map. Projects that request the linkage of core data sets from a single jurisdiction generally take the shortest amount of time.
- Projects that request data from core data sets in addition to external (non-core) data sets in which agreements already exist.
- Projects that request core data sets in addition to external data sets, but for which not all external data custodian approvals are in place.
- Projects that request the linkage of data all external to the data linkage units master linkage map and for which external approvals are not yet in place.
- Projects that request data from multiple jurisdictions.
- Projects that request Commonwealth Data.
Complexity of the project
Project complexity can significantly impact on data delivery timeframes and cost. The following aspects of a project can increase the complexity and therefore impact delivery times.
Depending on the request and state DLU involved, new linkages may be required. New linkages may take significant amounts of time to complete. The speed of the linkage depends very much on the size, timeframe, quality and completeness of the dataset being linked, and also which DLU is performing the linkage and the resources and experience available.
Cohort selection specifications
Projects become more complex as the number of cohort sources increases. The cohort selection is the first step in the preparation of a data extract for a project, so sourcing cohorts (or subset of cases of interest) from multiple datasets takes more time. In addition, the type and complexity selection criteria also impact on the time required, e.g. using a list of hundreds of ICD codes for selection of hospital records.
Control selection specifications
The selection of control groups can significantly increase a project’s overall complexity rating and extend delivery timelines. Control selection often requires a Linkage Officer and/or Data Analyst to write or amend scripts to select a suitable set of comparison records that may need to be matched or randomly selected and may also span multiple data sources.
Number and type of datasets
Similarly to the above criteria, projects become more complex and time consuming with each additional data source requested. Service data must be extracted from individual data collections for each project so therefore the more datasets involved, the more work required. Some datasets (e.g. Commonwealth datasets) may have particular governance and security requirements in place which will also impact on complexity and timelines. The complexity increases for data held in agencies outside health, or in other jurisdictions.
Source: Alex Godfrey and Tom Eitelhuber. Project complexity? 2014 http://www.datalinkage-wa.org.au/projects/project-complexity WA Data Linkage Branch, Perth WA.
Data linkage requires the use of personal information to make the initial link between data sets and since this is usually done without consent significant legal and ethical issues are engaged. In Australia, research projects using linked data must be approved by three partiesgroups: the data linkage unit; the data custodian responsible for each data set; and one or more Human Research Ethics Committee(s) (HREC).
The process of obtaining approvals and the time involved will vary between data linkage units, data custodians and HRECs. Below is a list of some of the things that influence the approval times for each application.
Data linkage unit approval
- quality of the application submitted;
- number of projects DLU currently processing;
- number of staff available to process the approval;
- complexity of project specification – for example, case/control studies; using a data set not linked previously; might require discussion/meeting with a custodian and/or with the linkage team; and
- complexity of linkage – for example (similar to above) the dataset might have limited identifiers or be a new dataset which requires more investigation of feasibility / meetings etc.
- quality of application submitted;
- complexity of the project;
- frequency of the HREC meetings;
- number of projects the HREC currently has under review;
- HREC’s policies on timelines; and
- the need to refer to external experts for scientific review or review of security arrangements.
Data custodian approval
- existing contract in place.