Page 34 - PATIENT REGISTRY DATA FOR RESEARCH: A Basic Practical Guide
P. 34
3.3 Set up Conditions and Requirements for Analysis
Before subjecting the registry data to the proposed statistical analysis, the researchers
will have to pre-specify the analytical principles and statistical techniques to be employed,
such as the inclusion and exclusion criteria for data collection and analysis. One of the
important pre-specified conditions for data analysis is the duration of the study period. For
instance, the study period for a registry report that is published on a yearly basis will be from
st
st
1 January until 31 December of the same year (National Transplant Registry, 2015; Wong
& Goh, 2016). When the research study or its data analysis involves only a subset of the
patients found in a registry, then it is necessary to clearly specify a list of strict inclusion
criteria for subject selection. For example, although the diabetes registry includes both type 1
and type 2 diabetic patients, however the researcher may intend to study them separately (and
therefore both types of patients will be analysed and reported separately) (Bujang et al.,
2018c; Bujang et al., 2018d).
3.4 Establish Data Cleaning Procedures
Now, we have pre-specified the plans for data analysis of the data set. The next step
we shall take is to perform data cleaning. At this stage, all the relevant variables will be
evaluated to determine whether they are properly coded and labelled, all the values are
expressed within the pre-specified range, and the layout of all relevant data have been
properly organized for the subsequent analysis. Some of the most common indicators of good
quality data are (i) all duplicates have already been removed from the data set, (ii) all the
outliers which exist only due to incorrect or invalid data entry have already been removed
from the data set (i.e. the truly valid observations should be kept even though they resemble
the outliers), (iii) all data inconsistencies have been rectified or (for example, sex is male and
pregnant status is 'yes') and all missing values have been fill in or imputed or declared as
missing.