Page 29 - PATIENT REGISTRY DATA FOR RESEARCH: A Basic Practical Guide
P. 29

3.1 Proper Handling of Duplicates Found in a Dataset

                       A common problem which often arises from a patient registry database is the presence


               of duplicate patient data (see Figure 3.1). The ideal data-capture mechanism for a registry is

               to allow each unique individual to be enrolled in a disease registry only once: for example,

               the National Diabetes Registry (NDR) allows the input of an individual patient's data only


               once (National Diabetes Registry, 2013). However, duplicates can possibly occur if the same

               patient had been registered in several different clinics or hospitals, which makes it difficult to


               track the same patient across multiple systems to identify those patients who are found to

               have been duplicated between various systems. It is an important and necessary step to


               remove duplicate patient data because they will lead to an overestimation of both the

               incidence and prevalence of disease. The final results will also be biased if there are too many

               duplicates because they can cause the actual population or census data to deviate from the


               truth.

                       To remove duplicate patient data in the registry, a researcher must take the first step in


               identifying those variables within the registry which are used to distinguish each individual

               patient, often referred to as the patients' unique identifiers. These identifiers commonly


               include a patient's name, his/her identity card (IC) number or identification number. In

               addition, the researcher will also need to pre-specify the criteria for determining a case as a


               duplicate. For example: would it be acceptable for those patients who had been enrolled more

               than once in the same centre to be regarded as duplicates? In the case of the Malaysian


               National Diabetic Registry, the answer is 'yes' (National Diabetes Registry, 2013).

                       However, it has been found that different patient registries may operate differently by

               imposing different criteria for the identification of duplicates. For instance, the National Eye


               Database (NED) allow the same patient to be enrolled in the same registry twice if and only if

               he/she were having problems in both eyes. In this case, the patient's unique identifier will
   24   25   26   27   28   29   30   31   32   33   34