Page 29 - PATIENT REGISTRY DATA FOR RESEARCH: A Basic Practical Guide
P. 29
3.1 Proper Handling of Duplicates Found in a Dataset
A common problem which often arises from a patient registry database is the presence
of duplicate patient data (see Figure 3.1). The ideal data-capture mechanism for a registry is
to allow each unique individual to be enrolled in a disease registry only once: for example,
the National Diabetes Registry (NDR) allows the input of an individual patient's data only
once (National Diabetes Registry, 2013). However, duplicates can possibly occur if the same
patient had been registered in several different clinics or hospitals, which makes it difficult to
track the same patient across multiple systems to identify those patients who are found to
have been duplicated between various systems. It is an important and necessary step to
remove duplicate patient data because they will lead to an overestimation of both the
incidence and prevalence of disease. The final results will also be biased if there are too many
duplicates because they can cause the actual population or census data to deviate from the
truth.
To remove duplicate patient data in the registry, a researcher must take the first step in
identifying those variables within the registry which are used to distinguish each individual
patient, often referred to as the patients' unique identifiers. These identifiers commonly
include a patient's name, his/her identity card (IC) number or identification number. In
addition, the researcher will also need to pre-specify the criteria for determining a case as a
duplicate. For example: would it be acceptable for those patients who had been enrolled more
than once in the same centre to be regarded as duplicates? In the case of the Malaysian
National Diabetic Registry, the answer is 'yes' (National Diabetes Registry, 2013).
However, it has been found that different patient registries may operate differently by
imposing different criteria for the identification of duplicates. For instance, the National Eye
Database (NED) allow the same patient to be enrolled in the same registry twice if and only if
he/she were having problems in both eyes. In this case, the patient's unique identifier will