Install and load necessary libraries for this example
The regstudies package provides useful tools for modifying and filtering the register data sets. Typically in register studies the data consists of events, which are stored in data as time stamps. In our example dataset the time stamps are date of hospital visits. Also, in register studies the data are commonly studied with respect to index date, which is a date of interest that can vary by individuals.
Our example data is simulated data that represents a postal questionnaire survey study which has been extended to register data. There are two data sets, the sample_cohort
is holding the study group, and sample_regdata
holds the hospital visits for the study group members.
More precisely, the sample_cohort
lists the individual id study numbers as id
for whom the register data has been collected. It also list the postingdate
holding the date of submitting the postal questionnaire.
The sample_regdata
contains also variable id
which is necessary to be able to link the data sets. The sample_regdata
also contains variables holding the diagnosis codes at CODE1
and times of hospital admission adm_date
and hospital discharge disc_date
.
summary(sample_cohort)
## personid gender postingdate
## Min. :1101 Min. :1.000 Min. :2000-01-01
## 1st Qu.:1201 1st Qu.:1.000 1st Qu.:2000-01-01
## Median :1300 Median :2.000 Median :2000-01-01
## Mean :1300 Mean :1.525 Mean :2000-01-01
## 3rd Qu.:1400 3rd Qu.:2.000 3rd Qu.:2000-01-01
## Max. :1500 Max. :2.000 Max. :2000-01-01
Dataset sample_regdata
is a dataset which has events for the selected population with ICD codes and date-variables:
summary(sample_regdata)
## personid CODE1 adm_date disc_date
## Min. :1101 Length:10000 Min. :1990-01-01 Min. :1990-01-18
## 1st Qu.:1212 Class :character 1st Qu.:1993-12-18 1st Qu.:1994-03-21
## Median :1305 Mode :character Median :1998-01-12 Median :1998-04-30
## Mean :1305 Mean :1998-01-04 Mean :1998-04-04
## 3rd Qu.:1400 3rd Qu.:2001-12-27 3rd Qu.:2002-03-25
## Max. :1500 Max. :2005-12-30 Max. :2006-06-24
## icd
## Length:10000
## Class :character
## Mode :character
##
##
##
We use two sample datasets sample_regdata
and sample_cohort
, which are included in the package. Joining these two datasets we got a full dataset ready for analyzes:
# Join register data with population cohort data d <- sample_regdata %>% left_join(sample_cohort)
## Joining, by = "personid"
head(d)
## # A tibble: 6 x 7
## personid CODE1 adm_date disc_date icd gender postingdate
## <int> <chr> <date> <date> <chr> <dbl> <date>
## 1 1101 196 1990-03-28 1990-05-20 icd9 2 2000-01-01
## 2 1101 I26 1990-04-25 1990-10-01 icd9 2 2000-01-01
## 3 1101 E222 1990-06-28 1990-08-07 icd9 2 2000-01-01
## 4 1101 39891 1990-09-19 1991-03-05 icd9 2 2000-01-01
## 5 1101 G10 1990-11-24 1991-01-04 icd9 2 2000-01-01
## 6 1101 I278 1991-11-01 1992-01-16 icd9 2 2000-01-01
Classification of codes can handle multiple code types on one occasion, but user needs to tell us which code is which type. This functionality allows user to classify data what contains both ICD-codes and pharmaceutical codes at one data. For example when using multiple versions of ICD-codes, we create a new variable called icd
:
With function classify_elixhauser()
and sum_score()
you can use Elixhauser classification and calculate scores. Check more info of classifications from classification tables-section.
regstudies:::elixhauser_classes
## # A tibble: 31 x 6
## class_elixhauser label_elixhauser icd9 icd10 score_AHRQ score_van_Walra…
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 chf Congestive Heart… ^39891… ^I099… 9 7
## 2 carit Cardiac arrhytmi… ^4260|… ^I441… 0 5
## 3 valv Valvular disease ^0932|… ^A520… 0 -1
## 4 pcd Pulmonary circul… ^4150|… ^I26|… 6 4
## 5 pvd Peripheral vascu… ^0930|… ^I70|… 3 2
## 6 hypunc Hypertension, un… ^401 ^I10 -1 0
## 7 hypc Hypertension, co… ^402|^… ^I11|… -1 0
## 8 para Paralysis ^3341|… ^G041… 5 7
## 9 ond Other neurologic… ^3319|… ^G10|… 5 6
## 10 cpd Chronic pulmonar… ^4168|… ^I278… 3 3
## # … with 21 more rows
d %>% # filter(icd == "icd10") %>% classify_elixhauser(icd_codes = CODE1)
## Joining, by = "label_elixhauser"
## # A tibble: 10,470 x 11
## personid CODE1 adm_date disc_date icd gender postingdate
## <int> <chr> <date> <date> <chr> <dbl> <date>
## 1 1101 196 1990-03-28 1990-05-20 icd9 2 2000-01-01
## 2 1101 I26 1990-04-25 1990-10-01 icd9 2 2000-01-01
## 3 1101 E222 1990-06-28 1990-08-07 icd9 2 2000-01-01
## 4 1101 39891 1990-09-19 1991-03-05 icd9 2 2000-01-01
## 5 1101 G10 1990-11-24 1991-01-04 icd9 2 2000-01-01
## 6 1101 I278 1991-11-01 1992-01-16 icd9 2 2000-01-01
## 7 1101 2409 1991-12-19 1992-05-27 icd9 2 2000-01-01
## 8 1101 F20 1992-02-14 1992-07-29 icd9 2 2000-01-01
## 9 1101 F10 1992-02-19 1992-03-15 icd9 2 2000-01-01
## 10 1101 4150 1992-03-23 1992-08-13 icd9 2 2000-01-01
## # … with 10,460 more rows, and 4 more variables: class_elixhauser <chr>,
## # label_elixhauser <chr>, score_AHRQ <dbl>, score_van_Walraven <dbl>
When using function argument verbose = TRUE
, message “Element ‘regex.rm’ NOT in use. Exceptions omitted.” is normal when ……..
To calculate elixhauser scores for each individual you just need to add sum_score
function. The premade classification table contains two alternative scores to be used score_AHRQ
and score_van_Walraven
both of which can be calculated at one function call.
# Calculate scores d %>% classify_elixhauser(icd_codes = CODE1) %>% sum_score(score_AHRQ, score_van_Walraven)
## Joining, by = "label_elixhauser"
## # A tibble: 300 x 3
## personid `sum(score_AHRQ)` `sum(score_van_Walraven)`
## <int> <dbl> <dbl>
## 1 1101 55 51
## 2 1103 25 21
## 3 1104 37 41
## 4 1105 21 22
## 5 1107 22 35
## 6 1110 43 32
## 7 1112 25 39
## 8 1114 0 12
## 9 1116 6 13
## 10 1118 19 29
## # … with 290 more rows
Function classifies all the rows which has classification code in regstudies:::elixhauser_classes
. If the is no classification on certain ICD code, row will get NA values.
Historgram of sums…..
Classifying Charlson comorbidity classes give as output long formatted data as wel Elixhauser classification:
## Classifying Charlson comorbidity classes to long format charlson <- d %>% classify_charlson(CODE1)
## Joining, by = "label_charlson"
head(charlson)
## # A tibble: 6 x 10
## personid CODE1 adm_date disc_date icd gender postingdate class_charlson
## <int> <chr> <date> <date> <chr> <dbl> <date> <chr>
## 1 1101 196 1990-03-28 1990-05-20 icd9 2 2000-01-01 metacanc
## 2 1101 I26 1990-04-25 1990-10-01 icd9 2 2000-01-01 <NA>
## 3 1101 E222 1990-06-28 1990-08-07 icd9 2 2000-01-01 <NA>
## 4 1101 39891 1990-09-19 1991-03-05 icd9 2 2000-01-01 chf
## 5 1101 G10 1990-11-24 1991-01-04 icd9 2 2000-01-01 <NA>
## 6 1101 I278 1991-11-01 1992-01-16 icd9 2 2000-01-01 <NA>
## # … with 2 more variables: label_charlson <chr>, score_charlson <dbl>
For example in modelling you might want to have dataset in wide format. To modify dataset to wide format, use dplyr::pivot_wider()
verb:
charlson %>% mutate(class_charlson=paste0("cl_",class_charlson)) %>% # TODO: laita tämä ominaisuus classify-funktioon sisälle! mutate(score_charlson=as.integer(score_charlson>0)) %>% tidyr::pivot_wider(names_from="class_charlson", values_from="score_charlson", values_fill=0) %>% select(-all_of(c("cl_NA","label_charlson"))) -> charlson_wide head(charlson_wide)
## # A tibble: 6 x 19
## personid CODE1 adm_date disc_date icd gender postingdate cl_metacanc
## <int> <chr> <date> <date> <chr> <dbl> <date> <int>
## 1 1101 196 1990-03-28 1990-05-20 icd9 2 2000-01-01 1
## 2 1101 I26 1990-04-25 1990-10-01 icd9 2 2000-01-01 0
## 3 1101 E222 1990-06-28 1990-08-07 icd9 2 2000-01-01 0
## 4 1101 39891 1990-09-19 1991-03-05 icd9 2 2000-01-01 0
## 5 1101 G10 1990-11-24 1991-01-04 icd9 2 2000-01-01 0
## 6 1101 I278 1991-11-01 1992-01-16 icd9 2 2000-01-01 0
## # … with 11 more variables: cl_chf <int>, cl_hp <int>, cl_canc <int>,
## # cl_pvd <int>, cl_pud <int>, cl_rend <int>, cl_mld <int>, cl_diab <int>,
## # cl_diabwc <int>, cl_aids <int>, cl_copd <int>
You can also add labels to variables in wide format data with code below
## calculating labels regstudies:::charlson_classes %>% select(class_charlson,label_charlson) %>% mutate(class_charlson=paste0("cl_",class_charlson)) -> labels # setting up the labels for wide data for(i in 1:dim(labels)[1]) { l <- labels$class_charlson[i] if(!is.null(charlson_wide[[l]])) { attr(charlson_wide[[l]], "label") <- labels$label_charlson[i] } }