Receipt of data

Sourcing data

What data do I need?

A list of the data collections currently available are provided on each of the data linkage unit websites. A list of the core datasets (those that are routinely linked) can be accessed here. Researchers will often be encouraged to talk to either the data custodians of the data collections from which they are requesting data or the client services officer to discuss the type of information held in the data collection, quality and whether it is likely that the proposed research question can be answered by the data collections requested.

Who has the data?

Within the PHRN the data linkage units are not data repositories and do not receive content data. The content data required for a research project is held by data custodians. Data custodians are the organisation or agency which is responsible for the collection, use and disclosure of information in that data collection. The data custodian is responsible for contributing to the guidelines and approval processes on the use of the data, including involvement with ethics committees and input to the protocols surrounding data use.

In some cases the data linkage unit will act on the data custodians’ behalf and request that researchers contact the data linkage unit rather than the data custodians. The contact person for each of the core data collections have been provided here. For those datasets not listed, please contact the client services officer from the data linkage unit to determine the most appropriate person to contact.  

Data flow

Who will I get the data from?

The researcher (contact investigator) will receive the de-identified data from each of the data custodians from which they requested data, from the data linkage unit, or a combination of both. Some data linkage units assist with the preparation of data prior to release to researchers. The tasks associated with this service include, pre-merge checking of data extracts, addition of derived variables to data extracts and merging of data extracts, post-merge checking prior to making data available to researchers and provision of data to researchers.

How will I get my data?

There are currently several data transfer methods identified in the ‘PHRN Data Transfer Agreement’ that researchers can use to send and receive files from data custodians and data linkage units. Data extracts can be transferred to the researcher via SUFEX, the data linkage units secure file upload facility, encrypted disk (note that 256-bit AES encryption security is preferred for Commonwealth data) or made accessible through SURE. If the data is coming from the data linkage unit rather than data custodians, the data linkage unit may elect to use their own encryption program. Data should not be sent by e-mail.

SUFEX

SUFEX uses a secure online system that allows users to send and receive files from anywhere at any time. It provides users with a secure file exchange service and is not a file storage solution.  

SUFEX has been designed to complement current data linkage processes and is initially intended to be used by individuals, such as researchers, who are responsible for sending and receiving data for data linkage research. Registered users will be given personal login credentials. Registered users can then send and request files from other registered, as well as from non-registered users. For more information about SUFEX click here.

Secure Unified Research Exchange (SURE)

To access SURE, users will be required to complete an individual user and study specific registration form, complete user training and sign an agreement of use. Access requires a username, password and one-time access code provided by an authentication token.

The only way for a file to enter or leave SURE is via a portal called the Curated Gateway. All inbound data files uploaded to the Curated Gateway for use in SURE will be reviewed by a member of the SURE operations team for compliance with ethics committee approval and data custodian requirements. Files other than data files are reviewed by the study’s principal investigator or an alternate senior investigator prior to being accepted for use in SURE. Outbound files uploaded to the Curated Gateway for use outside of SURE are reviewed by the study’s principal investigator or an alternate senior investigator.

The Curated Gateway can support alternative approval workflows and other parties may be involved in the review of inbound and outbound files passing through the Curated Gateway to enter or leave SURE if required for particular studies.

To minimise the risk of unauthorised access and attacks, files are scanned at multiple points with anti-virus software as they pass through the Curated Gateway and prior to storage within the SURE facility. The facility is protected by three layers of perimeter firewalls, as well as firewalls between each project workspace. All files that pass through the Curated Gateway are logged and may be subject to audit by the SURE team.

Encrypted disc

Data files are encrypted and burnt to disc, with the password provided in an alternative medium separately to the researcher. Where possible the disc should be collected in person from the data custodian or data linkage unit. The recipient of the disk will be required to sign a declaration acknowledging their responsibilities. Researchers who are not able to collect the disc in person may be able to have the encrypted data sent to them via a trackable Express Post satchel and will also be required to sign on collection.

Data format

What will my data look like when I get it?

As a researcher you will receive only the Project Person Numbers (PPN), Project Event Number (PPE) and their associated content variables, as listed in your approved application.

The amount of data researchers receive and how it's structured depends on the number of data files and fields requested, the temporal scope, and the size of the requested cohort.

Depending on the data linkage unit involved, the data may be provided to the researcher already merged. In most cases the researcher will receive the data as multiple files and be required to merge the data themselves.  A separate file is usually provided for each data collection in each year. For example, a researcher applying for data from the birth registry, perinatal data collection and admitted patient data collection, for the date range 2000-2009, would typically receive 30 files in total.

The data will be delivered in a variety of different formats, depending on the data linkage unit and data collection involved. Some data linkage units may deliver the data in a standardized format that can be easily read into any statistical analysis software, e.g. tab delimited text files.  In addition to the data files, researchers will also be given metadata for each corresponding data collection, including a data dictionary. The data dictionary provides coding information to assist researchers in interpreting the data.