Page 41 - PATIENT REGISTRY DATA FOR RESEARCH: A Basic Practical Guide
P. 41
The easiest way to handle missing data is to simply declare them as missing. A unique
code will be introduced to represent the missing values across all the variables. Assuming the
researcher uses the code "0" to define missing, and at the same time the same code "0" is also
used in another variable to represent answer "No" (for example: "1" for yes and "0" for no);
then another distinctly different code such as "9999" will be a better option to represent the
missing value instead of "0". In some instances, it is possible for several different codes to
represent the missing values, albeit each with a different definition. For example, code
"9999" can be used to represent the true missing value, whereas code "8888" can be used to
indicate that the variable is not relevant or not applicable to a particular patient (such as the
'pregnant' status for a male patient). Hence, it becomes necessary to adopt a different
approach for the analysis of these two different types of missing values.
Irrespective of how missing data will be dealt with, they can always be easily detected
during the data cleaning process. Simple descriptive analysis such as the percent frequency
(%) will be able to detect the total number of missing data that are found in a registry
database. Then, the researcher shall need to identify to which individual patient the missing
values actually belong by basing them on the individual identifiers such as the patient
identification number. Finally, the researchers will need to obtain a consensus among
themselves on the most appropriate way to handle the missing values. After having identified
the best way of handling the missing values, these missing values can then be replaced
accordingly by using an appropriate imputation technique. In addition, a full description of
the way in which the missing data are being replaced shall also be provided in the study
report or manuscript, in order to ensure that the selection of any imputation techniques that
have been applied for replacing the missing data are fully justified by the researchers and are
also made clear and transparent to the reader.