This section is designed to assist data custodians who are providing data for a data linkage research project. The information provides an outline of the five stages in a linkage project where data custodian involvement is required. These are:
- the application process;
- provision of identifying data to a data linkage unit;
- delivery of mapping file to data custodians;
- delivery of data extracts to the researcher; and
- monitoring of data linkage projects.
1. The application process
All data linkage studies require three separate approvals
- the data linkage unit;
- the data custodians of each data collection proposed for use in a linkage study; and
- one or more human research ethics committee(s).
This means that before a project can begin, the research team will approach the data custodian to seek approval for inclusion of his or her data in the project.
Specifically, the researcher will be requesting approval for:
- release of identifying data (e.g. name, address, date of birth) to the data linkage unit to enable the data to be linked to other datasets; and
- release of clinical or other content data data (e.g. diagnosis, treatment) to the researchers for analysis.
Strict privacy-preserving protocols are used to ensure that the data linkage unit receives only the identifying data but no content information, and the researchers receive only the content information but no identifying variables.
|Most data linkage units within the PHRN have staff who review researchers' application for data forms prior to them being submitted to data custodians for approval. This process ensures that only those applications that are well conceived and complete are submitted to data custodians.
Stage 2: Provision of identifying data to a data linkage unit
Once a project has received the approvals specified above, the data linkage unit will contact the relevant data custodians to request the linkage variables from their data collections. Linkage variables include fields such as: name; date of birth; sex; address; record date and other unique identifiers (such as hospital medical record number). Note that the linkage quality improves with the number of linkage variables available.
Linkage variables are what are used by the linkers to link the datasets to each other. The linkage variables for a particular project will have been specified during the application process, and listed in the researchers 'Application for Data' form.
Preparation of a data extraction plan including cohort/control selection
The data linkage unit and data custodians will work together to prepare a data extraction plan prior to the extraction process commencing. The nature of this process depends on the complexity of the project and the variables available to the linkers. Some projects can require multiple iterations to determine cohort selection.
Record identifiers/linkage variables
Along with the identifying data, the data linkage unit also requires one or both of the following additional fields from data custodians, depending on what is available:
- record ID: this identifies each individual record to be sent to the data linkage unit (contact the data linkage unit to see if any restrictions exist on the character length of this field); and
- patient or person ID: this identifies each individual or person within the custodian's database (contact the data linkage unit to see if any restrictions exist on the character length of this field). If there is no unique person number in the database then the Patient ID field should be set to the value of the Record ID.
For privacy purposes, it is recommended that custodians do not send the original Record ID or Patient ID from their datasets to the data linkage unit. Rather, project-specific numbers should be generated. This may be done using encryption software, or by generating autonumbers. For example, if there are multiple records per person and a unique person number is available, the Patient ID field can be populated as follows:
- create a list of records with unique person numbers;
- apply an auto number to each person number;
- set the Patient ID field to the autonumber (note that this autonumber will be different to the autonumber in the Record ID field); and
- RETAIN the mapping of autonumbers to Record ID and/or Patient ID until completion of the project. If this is not retained it will not be possible to join the information generated by data linkage units linkage process to the original data.
Most* data linkage units will accept data in comma delimited (CSV), Excel or plain text 'flat file' formats. Please discuss details with the data linkage unit responsible for linking the study prior to preparing data for transfer.
Data will be encrypted/password protected using a minimum of 128 bit AES encryption security Policies are in place that dictate the minimum level of encryption security required. As a result the minimal level of encryption required and the encryption programs used by the various data linkage units varies. All of the state data linkage units use a minimum of 128 bit AES encryption security; however a 256 bit AES encryption security minimum is preferred for Commonwealth data.
Since data sent to the data linkage unit by a data custodian contains identifying information, all files should be transferred securely.
All data is expected to be encrypted before being released. Data may be transferred to a data linkage unit in one of three ways:
- Stored on a data storage device such as CD or USB and hand-delivered to the national linkage unit.
- Stored on a data storage device such as CD or USB and sent by Registered Post to the national linkage unit.
- Sent to the national linkage unit via the PHRN secure file transfer system https://www.sufex.org.au. For more information on the secure file transfer go to Secure Data Transfer (SUFEX).
It is not acceptable to email the data. The encryption key for the data can either be conveyed by phone or SMS to data linkage unit staff or emailed. It should never be sent on the same data storage device that contains the identifying data.
When liaising with the data linkage unit regarding the data extraction, it is permissible to send a 'test' or 'sample' data by email or via the secure file transfer system (https://www.sufex.org.au) in order to confirm variables, text length, etc. This data must not be real data.
Stage 3: Delivery of mapping file to data custodians
Once the linkage is complete, the data linkage unit creates a 'project key' for each data collection, which is returned to the respective data custodian to have the content data variables attached to the project keys.
A project key is an encryption of a master linkage key that is provided to a researcher for a specific approved data linkage project. The format and number of fields contained in the project key will depend on the Data Linkage Unit. Project Keys are provided in either comma-separated values of fixed width format and consist of either three or four fields. These include:
An example of a typical file that goes from WA DLU to the Data Custodians looks like this:
- encrypted PPN (a unique Project Person Number for each individual in the data collection);
- record ID (or Patient ID, depending on what has been agreed with the custodian);
- and encrypted PEN (a unique Project Event Number),
- a source code field (which identifies the data collection).
The data custodian then:
- translates the Record ID in the project key file into the custodian's internal Record ID using either the previously generated autonumber lookup table, or decryption, depending on which method was used to generate the Record ID;
- joins the PPN and PEN to the database records, using the Record ID;
- extracts the PPN, PEN and content data variables that were previously approved for the project from the database; and
- forwards the content data to the researcher. The file to be sent to the research team should only contain the record type encrypted PPNs and PENs and the approved content data.
Stage 4: Delivery of data extracts to the researcher
The responsibility for delivery of data extracts to researchers varies and is dependent on the operational model of each data linkage unit as well as independent data custodian preferences. Linked data files are usually provided to researchers on a CD as tab delimited or fixed width, text files.
Research data is expected to be encrypted prior to release. Data may be transferred to the researcher in one of three ways:
- Stored on a data storage device such as CD or USB and either collected by, or hand-delivered to, the researcher;
- Stored on a data storage device such as CD or USB and sent by Registered Post to the researcher; and
- Sent to the researcher via the PHRN secure file transfer system https://www.sufex.org.au. For more information on the secure file transfer go to Secure Data Handling (SUFEX).
It is not acceptable to email the data. The encryption key for the data can either be conveyed by phone to the researcher or emailed. It cannot be sent on the same data storage device that contains the de-identified data.
If the data to be sent is only record numbers such as Jurisdictional Project Specific Linkage Keys or National Project Specific Linkage Keys, and no other information, it is permissible to send this data by email to the researcher. However, the data must be encrypted and the encryption key must either be conveyed by phone or SMS, or sent in a separate email.
Data that is transferred must be encrypted. Winzip and 7zip are two software programs that enable the encryption of data. The interface utilises drop down menus and encryption is applied to the data file. Passwords can be created by the user and it is encouraged that they should contain numbers, letters and symbols to ensure password strength. Confirmation should always be sought that the data has been received by the National linkage unit or the researcher before conveying the password.
Preparation of data prior to release to researchers
Some data linkage units assist with the preparation of data prior to release to researchers. The tasks associated with this service include, pre-merge checking of data extracts, addition of derived variables to data extracts and merging of data extracts and post-merge checking prior to making data available to researchers.
Data linkage exposes duplication errors in administrative data and other technical issues with the data. For those data linkage units that provide feedback to data custodians, these errors and issues can be resolved resulting in better data quality and recording at the administrative level.
Linked data files are usually provided to researchers as tab delimited or fixed width, text files.
Data extraction tip
Many data custodians find that it is more efficient to extract the identifiers and content data required for a project at the same time. This data file can then be used for both Stages 2 and 4 above: the identifiers can be sent to the data linkage unit at the start of the project for linkage, and once the linkage is complete and the PPNs have been attached, the content data variables can be sent to the researchers.
Stage 5: Monitoring of data linkage projects
Data custodians are responsible for monitoring compliance with the terms and conditions of their agreements with research institutions and researchers. However, they have the ability to include the requirement for independent monitoring of compliance in their agreements should they require it.
|A client service provided by many of the data linkage units within the PHRN is to monitor data linkage projects. The tasks associated with this service most commonly include researcher outputs e.g. publications and monitoring of data destruction and archiving at the end of the project.