Page 34 - PATIENT REGISTRY DATA FOR RESEARCH: A Basic Practical Guide
P. 34

3.3 Set up Conditions and Requirements for Analysis

                       Before subjecting the registry data to the proposed statistical analysis, the researchers


               will have to pre-specify the analytical principles and statistical techniques to be employed,

               such as the inclusion and exclusion criteria for data collection and analysis. One of the

               important pre-specified conditions for data analysis is the duration of the study period. For


               instance, the study period for a registry report that is published on a yearly basis will be from

                 st
                                  st
               1 January until 31 December of the same year (National Transplant Registry, 2015; Wong
               & Goh, 2016). When the research study or its data analysis involves only a subset of the

               patients found in a registry, then it is necessary to clearly specify a list of strict inclusion


               criteria for subject selection. For example, although the diabetes registry includes both type 1

               and type 2 diabetic patients, however the researcher may intend to study them separately (and

               therefore both types of patients will be analysed and reported separately) (Bujang et al.,


               2018c; Bujang et al., 2018d).




               3.4 Establish Data Cleaning Procedures

                       Now, we have pre-specified the plans for data analysis of the data set. The next step


               we shall take is to perform data cleaning. At this stage, all the relevant variables will be

               evaluated to determine whether they are properly coded and labelled, all the values are


               expressed within the pre-specified range, and the layout of all relevant data have been

               properly organized for the subsequent analysis. Some of the most common indicators of good


               quality data are (i) all duplicates have already been removed from the data set, (ii) all the

               outliers which exist only due to incorrect or invalid data entry have already been removed


               from the data set (i.e. the truly valid observations should be kept even though they resemble

               the outliers), (iii) all data inconsistencies have been rectified or (for example, sex is male and

               pregnant status is 'yes') and all missing values have been fill in or imputed or declared as


               missing.
   29   30   31   32   33   34   35   36   37   38   39