Using own classifications • regstudies

Classifications

Code classification by own csv table

The package has been built in mind of allowing user to use their own classifications for ICD codes (or any string codes). The classifications can utilise multiple different types of codes. For example, below we use ICD-10 and ICD-9 codes on a single classification definition. The classification definition need to hold following variable names: class_X and label_X, where X is the name of class defintion and also at least one other column what defines the regular expression for the class. The regular expression describes all the codes that belong to the same class. For example in our example data set, the first group is called “aids” and regular expression for icd10 column is "^B20|^B21|^B22|^B24". This means that all the codes starting with either B20, B21, B22 or B24, which are labeled as "icd10" by another variable icd (TODO: codetype) in the data, belong to this class. The symbol “^” states that the codes must begin with these characters. The symbol “|” is OR operator in regular expression syntax. For more information about regular expression, please read Regular Expression Cheat Sheet.

library(regstudies)
# ´regstudies´ makes it easy to classify codes such as ICD-codes to groups and also to sum scores based on the groupings.
sample_data <- left_join(sample_cohort,sample_regdata)
sample_data %>%
  classify_charlson(icd_codes = CODE1) %>%
  sum_score(score_charlson)

# Users can define the code groups themselves using simple regular expression syntax.
my_classes <-
  tribble(~class_myclass,         ~label_myclass,                                 ~icd10,                 ~icd9, ~score,
                  "aids",          "AIDS or HIV",                  "^B20|^B21|^B22|^B24",      "^042|^043|^044",      7,
              "dementia",             "Dementia", "^F00|^F01|^F02|^F03|^F051|^G30|^G311",    "^290|^2941|^3312",      3,
                   "pud", "Peptic ulcer disease",                  "^K25|^K26|^K27|^K28", "^531|^532|^533|^534",      1
          )

# This classification definition has two different type of codes.
head(my_classes)

# Data has three different types of codes.
head(sample_data)

# Different groups types can are automatically handled simultaneously.
sample_data %>%
  classify_codes(codes=CODE1,diag_tbl = read_classes(my_classes)) %>%
  sum_score(score)

Elixhauser classification and scores

This calculates per row Elixhauser classification. Check more info of classification from classification tables-section.

# Classify by Elixhauser:
my_classes <- read_classes(file = "../data/elixhauser_classes.csv")

classified_d <- filtered_d %>%
  classify_codes(codes = CODE1,
                 diag_tbl = my_classes)
head(classified_d)

Another way (a shortcut) for Elixhauser classifications:

elixh_d <- filtered_d %>%
  classify_elixhauser(icd_codes = CODE1)
head(elixh_d)

To calculate the scores of Elixhauser comorbidity index, we can use sum_score function. As our classification table defines two alternative score definitions, namely score_AHRQ and score_van_Walraven, we can calculate both of them on one call as follows:

elixh_score <- filtered_d %>%
  classify_elixhauser(icd_codes = CODE1) %>%
  sum_score(score_AHRQ, score_van_Walraven)
head(elixh_score)

Charlson classification and scores

We also provide Charlson comorbidity classification ready to use in regstudies. It can be easilly used with function classify_charlson. The use of classify_charlson is very identical to use of classify_elixhauser. Only difference is that in the classification the score-variable is called score_charlson.

## Classifying Charlson comorbidity classes to long format

charlson_score <- filtered_d %>%
  classify_charlson(icd_codes = CODE1) %>%
  sum_score(score_charlson)
head(charlson_score)

Long data to wide data

The outputs of all classification functions (classify_elixhauser, classify_charlson and classify_codes) gives output as long data. If user need to have class indicators or scores of each events as wide, it can be accomplished using pivot_wider as follows.

charlson_d <- filtered_d %>%
  classify_charlson(icd_codes = CODE1)

charlson_d %>%
  mutate(class_charlson=paste0("cl_",class_charlson)) %>% # TODO: laita tämä ominaisuus classify-funktioon sisälle!
  mutate(score_charlson=as.integer(score_charlson>0)) %>%
  tidyr::pivot_wider(names_from="class_charlson",
                     values_from="score_charlson",
                     values_fill=0) %>%
  select(-all_of(c("cl_NA","label_charlson"))) -> wide # Some events do not belong to any class. That creates NA to data and pivot_wider handles NA classes to cl_NA column in this code.
head(wide)

Some extra: labeling wide data

## calculating labels
regstudies:::charlson_classes %>%
  select(class_charlson,label_charlson) %>%
  mutate(class_charlson=paste0("cl_",class_charlson)) -> labels

# setting up the labels for wide data
for(i in 1:dim(labels)[1]) {
  l<-labels$class_charlson[i]
  if(!is.null(wide[[l]])) {
    attr(wide[[l]], "label") <- labels$label_charlson[i]
  }
}
head(wide)