Page 33 - PATIENT REGISTRY DATA FOR RESEARCH: A Basic Practical Guide
P. 33
3.2 Match and Combine Data from Different Datasets
It is not uncommon for data from a registry to be stored by various separate sections
within a registry database. Therefore, it will be necessary to combine these sections together
to form a single data set. For example, the National Cardiovascular Disease Registry (NCVD)
has two main forms: namely the notification form and the follow-up form. Data obtained
from both forms will be stored in two separate sub-databases. These two sub-databases will
have to be merged during analysis since some of the results will require input from both
sub-datasets (Wan-Ahmad & Liew, 2016).
Meanwhile, it is also possible for some of the study objectives to necessitate the
establishment of a linkage between the registry data and the data from an external data set,
such as matching the original registry data with the data obtained from the National Death
Record to determine the survival rate of the patients (Wong & Goh, 2016). To link together
the data obtained from two different data sets, both data sets will need to have unique
identifiers which can be matched by either deterministic or probabilistic matching, which are
two different strategies for record linkage or data matching. When the investigators are
dealing with sensitive information such as patients' identifiers, they should ensure that prior
consent has been obtained from the patients and all the necessary approvals have been
granted by the respective authorities.
Before performing the matching process, the identifiers have to be distinctly unique in
both data sets. Although matching can be performed by using statistical software, however it
is still necessary to perform a validation step by conducting several random checks on these
records to make sure that exact matching had been performed. This can be achieved by
obtaining a random sample of several matched records and comparing them against the
source data from the original data sets.