Page 34 - PATIENT REGISTRY DATA FOR RESEARCH: A Basic Practical Guide

P. 34

3.3 Set up Conditions and Requirements for Analysis

Before subjecting the registry data to the proposed statistical analysis, the researchers

will have to pre-specify the analytical principles and statistical techniques to be employed,

such as the inclusion and exclusion criteria for data collection and analysis. One of the

important pre-specified conditions for data analysis is the duration of the study period. For

instance, the study period for a registry report that is published on a yearly basis will be from

st
st
1 January until 31 December of the same year (National Transplant Registry, 2015; Wong
& Goh, 2016). When the research study or its data analysis involves only a subset of the

patients found in a registry, then it is necessary to clearly specify a list of strict inclusion

criteria for subject selection. For example, although the diabetes registry includes both type 1

and type 2 diabetic patients, however the researcher may intend to study them separately (and

therefore both types of patients will be analysed and reported separately) (Bujang et al.,

2018c; Bujang et al., 2018d).

3.4 Establish Data Cleaning Procedures

Now, we have pre-specified the plans for data analysis of the data set. The next step

we shall take is to perform data cleaning. At this stage, all the relevant variables will be

evaluated to determine whether they are properly coded and labelled, all the values are

expressed within the pre-specified range, and the layout of all relevant data have been

properly organized for the subsequent analysis. Some of the most common indicators of good

quality data are (i) all duplicates have already been removed from the data set, (ii) all the

outliers which exist only due to incorrect or invalid data entry have already been removed

from the data set (i.e. the truly valid observations should be kept even though they resemble

the outliers), (iii) all data inconsistencies have been rectified or (for example, sex is male and

pregnant status is 'yes') and all missing values have been fill in or imputed or declared as

missing.

29 30 31 32 33 34 35 36 37 38 39