The most common risk associated with data linkage is the risk to privacy. The seriousness of the risks associated with the use of personal information will be affected by the degree of identifiability of the information. Consideration needs to be given as to how likely it is that the identity of individuals can be ascertained from the information involved in a project.
Some of the factors that are relevant are:
- type of information;
- quantity of information;
- methods of statistical disclosure control used;
- other information held by person who receives it (the information itself does not need to be intrinsically identifiable. Extrinsic material held by the recipient should also be considered); and
- skills and technology of person who receives it.
According to the National Statement (2007) data may be collected, stored or disclosed in three mutually exclusive forms:
Individually identifiable data - where the identity of a specific individual can reasonably be ascertained. Examples of identifiers include the individual's name, image, date of birth or address;
Re-identifiable data - from which identifiers have been removed and replaced by a code, but it remains possible to re-identify a specific individual by, for example, using the code or linking different datasets;
Non-identifiable data - which have never been labelled with individual identifiers or from which identifiers have been permanently removed, and by means of which no specific individual can be identified. A subset of non-identifiable data are those that can be linked with other data so it can be known that they are about the same data subject, although the person's identity remains unknown.
If research 'involves the use of existing data collections of data or records that contain only non-identifiable data' then the National Statement says that it can be categorised as negligible risk and it is exempt from ethical review (NS 5.1.22). This is unlikely to be relevant to data linkage projects because the identifiers are not permanently removed from the data collections and so it is not non-identifiable data. The data linkage units retain the identifiers to upgrade the links, the data custodians retain both the content information and the identifiers, and the information received by researchers can be re-identified by the data custodian.