Preliminaries

library(skimr); library(tableone)
library(magrittr); library(janitor) 
library(broom); library(survival); library(lme4)
library(cobalt); library(Matching); library(MatchIt)
library(rms)
library(yardstick)
library(naniar)
library(rbounds)
library(survey)
library(twang); 
library(tidyverse)

theme_set(theme_bw())

1 My Data

Healthcare Cost and Utilization Project, Nationwide Inpatient Sample (HCUP-NIS) is the largest available all-payer inpatient healthcare administrative data set. It approximates a 20-percent stratified sample of all discharges from United States hospitals. It constitutes data from 48 states and 10,000 community hospitals, representing 95% of the United States population. Data from each record contains information regarding patient demographics, diagnoses, procedures, and other information associated with a hospital admission.

The data can be purchased by the public with at the following link: https://www.hcup-us.ahrq.gov/nisoverview.jsp

Strengths of how this data set relates to my research question:

Nationally representative
Inpatient hospitalizations
Collects information on sociodemographic factors which I can adjust for
Collects information on up to 40 diagnoses, thus I can capture both the exposure and outcome

Limitations of the data set:

the data quality of secondary databases is not perfect as the diagnoses codes may not necessarily be accurate, granular, or complete
The latest data available is from 2018.

1.1 Data Ingest

Below I am ingesting my data ms1718_raw.

ms1718_raw<- read.csv("ms1718.csv") %>% 
  clean_names()

ms1718_raw <- ms1718_raw %>% 
  haven::zap_label()  %>% 
  mutate(key_nis = as.character(key_nis)) %>% select(-oi)

dim(ms1718_raw)

[1] 682968    100

As originally loaded, the ms1718_raw data contain 682968 rows and 100 columns.

1.2 Tidying, Data Cleaning and Data Management

Below I am cleaning the data according to the HCUP_NIS code book found at https://www.hcup-us.ahrq.gov/db/nation/nis/nisdde.jsp

In summary I have:

1.) converted all variables to factors except for age, key_nis, and discwt

2.) coded the variable levels for factors with more descriptive names rather than numbers

3.) reordered according to frequency with fct_infreq

4.) selected only the variables that I will use

Below I am creating the outcome, OI from OI_type. If there are no OIs, then OI=0, else OI=1

ms1718_raw <- ms1718_raw %>% 
  mutate(oi = ifelse(oi_type == "none", 0, 1))

ms1718_raw %>% tabyl(oi)

 oi      n   percent
  0 562476 0.8235759
  1 120492 0.1764241

ms1718 <- ms1718_raw %>% 
mutate(female=as.numeric(female)) %>% 
  mutate(sex = fct_recode(factor(female),
"male" = "0", "female" = "1")) %>% 
    # zipinc_qrtl = fct_recode(factor(zipinc_qrtl),
    #                         "<48K"= "1",
    #                         "48-61K" = "2",
    #                         "61-82K"= "3",
    #                         "82K+" = "4")) %>% 
mutate(pay1=as.numeric(pay1)) %>% 
  mutate(insurance  = fct_recode(factor(pay1),
                  "Medicare" = "1",
                  "Medicaid" = "2",
                  "Private" = "3",
                  "Self_pay" = "4",
                  "Other" = "5",
                  "Other" = "6"),
insurance = fct_infreq(insurance), 
    patient_loc =  fct_recode(factor(pl_nchs),
                  "Central" = "1",
                  "Fringe" = "2",
                  "metro>250K" = "3",
                  "metro>50K" = "4",
                  "micro" = "5",
                  "Other" = "6" ),
patient_loc = fct_infreq(patient_loc)) %>% 
  mutate(region = fct_recode(factor(hosp_division),
                  "NewEngland" = "1",
                  "Middle_Atlantic" = "2",
                  "EastNorth_Central" = "3",
                  "WestNorth_Central" = "4",
                  "South_Atlantic" = "5",
                  "EastSouth_Central" = "6",
                  "WestSouth_Central" = "7",
                  "Mountain" = "8",
                  "Pacific" = "9"),
        region = fct_infreq(region)) %>% 
  mutate(ED_record = fct_recode(factor(hcup_ed),
          "no" = "0",
          "yes" = "1", "yes" = "2", "yes" ="3", "yes"="4")) %>% 
 mutate(oi_f = fct_recode(factor(oi),
                           "yes"= "1",
                          "no" ="0")) %>% 
  mutate(ms_f = fct_recode(factor(ms),
                           "yes"= "1",
                           "no" ="0")) %>% 
  select(-female, - hcup_ed, - hosp_division, - pay1, -pl_nchs, -hosp_nis)

1.3 checking variables

1.3.1 categorical variables

1.3.1.1 zip_inc

ms1718 <- ms1718 %>% 
  mutate(zipinc_qrtl= fct_recode(factor(zipinc_qrtl),
"<48K"= "1",
"48-61K" = "2",
"61-82K"= "3",
"82K+" = "4",
NULL= "A",
NULL = ""))

ms1718 %>% tabyl(zipinc_qrtl)

 zipinc_qrtl      n    percent valid_percent
        <48K 199757 0.29248369     0.2975692
      48-61K 179309 0.26254378     0.2671087
      61-82K 159814 0.23399925     0.2380679
        82K+ 132416 0.19388317     0.1972543
        <NA>  11672 0.01709011            NA

1.3.1.2 race

ms1718 <- ms1718 %>% 
mutate(race=as.numeric(race)) %>% 
  mutate(race = fct_recode(factor(race),
      "White" = "1", 
               "Black" = "2", 
                "Hispanic"= "3",
                "Asian" = "4",
                "NativeA"= "5",
               "Other" = "6")) %>% 
  mutate(race = fct_infreq(race))

we have a lot of missing in the MS group for race. and the numbers are quite small for asian and native american.

ms1718 %>% tabyl(ms, race)

 ms  White Black Hispanic Other Asian NativeA   NA_
  0 404266 91201    68467 18247 16618    3712 18369
  1  45319  9732     3492  1256   360     218  1711

1.3.1.3 aweekend

An indicator of whether the admission day is on the weekend (AWEEKEND) is calculated from the admission date (ADATE). If AWEEKEND cannot be calculated (ADATE is missing or invalid).

ms1718 <- ms1718 %>% 
  mutate(aweekend = fct_recode(factor(aweekend),
"no" = "0", "yes" = "1"))

ms1718 %>% tabyl(ms_f, aweekend)

 ms_f     no    yes NA_
   no 492829 128047   4
  yes  48486  13602   0

1.3.2 elective

ELECTIVE indicates whether the admission to the hospital was elective. This information was derived from the type of admission (ATYPE). If the admission type was missing or invalid, then ELECTIVE is also missing or invalid. If the admission type indicated an elective admission (ATYPE = 3), then ELECTIVE was set to 1. Otherwise, for any other valid non-missing ATYPE values, ELECTIVE was set to 0.

ms1718 <- ms1718 %>% 
mutate(elective=as.numeric(elective)) %>% 
  mutate(elective_admin = fct_recode(factor(elective),
"no" = "0", "yes" = "1")) %>% 
  select(-elective)

elective admin is fine
some missing values (ms group n=39)

ms1718 %>% tabyl(ms_f, elective_admin)

 ms_f     no    yes  NA_
   no 475066 144804 1010
  yes  52523   9461  104

1.3.3 Teaching status (h_contrl)

Teaching Status: Beginning in 1998, a hospital is considered a teaching hospital if it has one or more Accreditation Council for Graduate Medical Education (ACGME) approved residency programs, is a member of the Council of Teaching Hospitals (COTH) or has a ratio of full-time equivalent interns and residents to beds of .25 or higher. Rural hospitals were not split according to teaching status, because rural teaching hospitals were rare

ms1718 <- ms1718 %>% 
  mutate(hosp_locteach = fct_recode(factor(hosp_locteach),
"rural" = "1", "urban_nonteaching" = "2", "urban_teaching" = "3" ))

ms1718 %>% tabyl(ms_f, hosp_locteach)

 ms_f rural urban_nonteaching urban_teaching
   no 56708            135839         428333
  yes  5070             13266          43752

The hospital’s ownership/control category was obtained from the AHA Annual Survey of Hospitals and includes categories for government nonfederal (public), private not-for-profit (voluntary) and private investor-owned (proprietary). Hospitals in different ownership/control categories tend to have different missions and different responses to government regulations and policies.

ms1718 <- ms1718 %>% 
  mutate(h_contrl = fct_recode(factor(h_contrl),
"gov_nonfed" = "1", "private_notprofit" = "2", "Private_profit" = "3" ))

ms1718 %>% tabyl(ms_f, h_contrl)

 ms_f gov_nonfed private_notprofit Private_profit
   no      71258            456378          93244
  yes       6010             48394           7684

1.3.4 tran_in

The data element TRAN_IN indicates that the non-newborn patient was transferred into the hospital and is defined using either admission source (ASOURCE) or point of origin (PointOfOriginUB04), depending on data availability. The coding of admission source and point of origin varies by the admission type. When the admission type indicates a newborn (ATYPE=4) then the admission source and point of origin indicate the type of birth instead of the type of transfer. Therefore, the identification of transfers in TRAN_IN is specific to non-newborn patients with ATYPE not equal to 4.

ms1718 <- ms1718 %>% 
  mutate(tran_in = fct_recode(factor(tran_in),
"not_transferred" = "0", "acute_care" = "1", "other" = "2" ))

ms1718 %>% tabyl(ms_f, tran_in)

 ms_f not_transferred acute_care other  NA_
   no          555216      39194 23580 2890
  yes           53965       3992  3848  283

1.3.5 Bedsize

Bedsize categories are based on hospital beds, and are specific to the hospital’s location and teaching status. Bedsize assesses the number of short-term acute care beds set up and staffed in a hospital. Hospital information was obtained from the AHA Annual Survey of Hospitals.

ms1718 <- ms1718 %>% 
  mutate(hosp_bedsize = fct_recode(factor(hosp_bedsize),
"small" = "1", "medium" = "2", "large" = "3" ))

ms1718 %>% tabyl(ms_f, hosp_bedsize)

 ms_f  small medium  large
   no 128612 182753 309515
  yes  13065  18329  30694

1.3.6 hosp_region

# ms1718 <- ms1718 %>% 
#   mutate(hosp_region = fct_recode(factor(hosp_region),
# "Northeast" = "1", "Midwest" = "2", "South" = "3", "West" = "4" ))
# 
# ms1718 %>% tabyl(ms_f, hosp_region)

1.3.7 comorbidities

I am turning all of the comorbidities from character variables to factor variables

ms1718 <- ms1718 %>% 
 mutate(depression = as.factor(depression), htn = as.factor(htn), migraine = as.factor(migraine), hld = as.factor(hld), anxiety = as.factor(anxiety), copd = as.factor(copd), asthma = as.factor(asthma), ibs = as.factor(ibs), hashimoto = as.factor(hashimoto), osteoporosis = as.factor(osteoporosis), dm = as.factor(dm),  ra = as.factor(ra), fibromyalgia = as.factor(fibromyalgia), cad = as.factor(cad), ibd = as.factor(ibd), glaucoma = as.factor(glaucoma), bipolar = as.factor(bipolar), epilepsy = as.factor(epilepsy), pvd = as.factor(pvd), ckd = as.factor(ckd), lupus = as.factor(lupus), psoriasis = as.factor(psoriasis), scd = as.factor(scd), hiv = as.factor(hiv), kidney=as.factor(kidney))

1.4 check quantitative variables

All numeric variables look plausible (age, amonth, discwt)

ms1718 %>% select(age, amonth, discwt) %>% Hmisc::describe()

. 

 3  Variables      682968  Observations
--------------------------------------------------------------------------------
age 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
  682968        0       73        1    57.95    22.73       24       28 
     .25      .50      .75      .90      .95 
      42       61       74       83       88 

lowest : 18 19 20 21 22, highest: 86 87 88 89 90
--------------------------------------------------------------------------------
amonth 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
  682168      800       12    0.993    6.477    3.981        1        2 
     .25      .50      .75      .90      .95 
       3        6        9       11       12 

lowest :  1  2  3  4  5, highest:  8  9 10 11 12
                                                                            
Value          1     2     3     4     5     6     7     8     9    10    11
Frequency  60355 53961 58609 55660 57503 55842 56987 58172 54819 58122 55405
Proportion 0.088 0.079 0.086 0.082 0.084 0.082 0.084 0.085 0.080 0.085 0.081
                
Value         12
Frequency  56733
Proportion 0.083
--------------------------------------------------------------------------------
discwt 
        n   missing  distinct      Info      Mean       Gmd       .05       .10 
   682968         0       314     0.996         5 6.098e-05         5         5 
      .25       .50       .75       .90       .95 
        5         5         5         5         5 

lowest : 4.991597 4.995833 4.996289 4.996753 4.996894
highest: 5.001855 5.001938 5.001965 5.001972 5.002649
--------------------------------------------------------------------------------

1.5 Missingness

I have 41270 missing observations in the ms1718 data set.

For the control group, I filtered out complete cases on the variables of interest before I did the 10% sample. So the missingness is really only in the ms group.

Below we can see that we have the most missingness for race (2.1%), zipinc_qrtl (1.2%), tran_in (0.43%), and patient_loc (0.28%), elective_admin (0.11%), insurance (.105%)

ms1718 %>% select(age, amonth, aweekend, discwt, key_nis, race, tran_in,  zipinc_qrtl, ms, hosp_bedsize, hosp_locteach,  h_contrl, sex, insurance, patient_loc, region, ED_record, elective_admin) %>% 
gg_miss_var()

ms1718 %>% select(age, amonth, aweekend, discwt, key_nis, race, tran_in,  zipinc_qrtl, ms, hosp_bedsize, hosp_locteach,  h_contrl, sex, insurance, patient_loc, region, ED_record, elective_admin, ms_f) %>% group_by(ms_f) %>%  miss_var_summary()

# A tibble: 36 x 4
# Groups:   ms_f [2]
   ms_f  variable       n_miss pct_miss
   <fct> <chr>           <int>    <dbl>
 1 yes   race             1711  2.76   
 2 yes   zipinc_qrtl       774  1.25   
 3 yes   tran_in           283  0.456  
 4 yes   patient_loc       166  0.267  
 5 yes   elective_admin    104  0.168  
 6 yes   insurance          79  0.127  
 7 yes   amonth             24  0.0387 
 8 yes   sex                 1  0.00161
 9 yes   age                 0  0      
10 yes   aweekend            0  0      
# … with 26 more rows

Most of the cases aren’t missing any data

miss_case_table(ms1718)

# A tibble: 5 x 3
  n_miss_in_case n_cases pct_cases
           <int>   <int>     <dbl>
1              0  645918 94.6     
2              1   32939  4.82    
3              2    4003  0.586   
4              3     107  0.0157  
5              4       1  0.000146

1.5.1 missingness mechanism

MAR

1.6 selecting only variables I need

paste(colnames(ms1718), collapse = ",  ")

[1] "age,  amonth,  aweekend,  discwt,  key_nis,  race,  tran_in,  year,  zipinc_qrtl,  ms,  hosp_bedsize,  hosp_locteach,  hosp_region,  h_contrl,  rec_pneumonia,  inv_gbs,  invasive_enterobacteriaceae,  dissem_tb,  bacteremia_meningitis,  disseminated_bartonella,  legionella,  m_aviuum,  candida_severe,  invasive_aspergillus,  pcp,  bartonella,  crypto_extrapul,  coccidioidomycosis,  histoplasmosis,  mucormycosis,  cmv_pneumonia,  cmv_pancreatitis,  other_severe_cmv,  cmv_other,  ebv,  hsv_encephalitis,  varicella_systemic,  zoster,  rsv,  hpn,  hhv6_7,  pml,  enteroviral_meningitis,  parvovirus,  babesia,  toxoplasma,  visceral_leishmaniasis,  acanthamoeba,  naegleriasis,  strongyloidiasis,  taenia,  flu,  nosocomial,  vap,  catheter_inf,  surgical_inf,  bloodstream_catheter,  c_diff,  oi_type,  depression,  htn,  migraine,  hld,  anxiety,  copd,  asthma,  ibs,  hashimoto,  osteoporosis,  dm,  ra,  fibromyalgia,  cad,  ibd,  glaucoma,  bipolar,  epilepsy,  pvd,  ckd,  lupus,  psoriasis,  scd,  hiv,  ckd1,  ckd2,  ckd3,  ckd4,  ckd5,  esrd,  ck_dother,  hf,  obese,  kidney,  oi,  sex,  insurance,  patient_loc,  region,  ED_record,  oi_f,  ms_f,  elective_admin"

ms1718 <- ms1718 %>% select( key_nis, year, ms, ms_f, oi,  oi_f,  discwt,  age, sex,  race,  insurance,  patient_loc,  region, zipinc_qrtl, obese,  htn, cad, hf, hld, pvd, dm, kidney, copd,  asthma,  osteoporosis,  ra,  fibromyalgia,   glaucoma, depression,  anxiety, bipolar,  epilepsy, migraine,  ibd, ibs, hashimoto, lupus,  psoriasis,  scd,  hiv, ED_record, elective_admin, tran_in, aweekend ,amonth, hosp_bedsize,  hosp_locteach,   h_contrl, oi_type, rec_pneumonia,  inv_gbs,  invasive_enterobacteriaceae,  dissem_tb,  bacteremia_meningitis,  disseminated_bartonella,  legionella,  m_aviuum,  candida_severe,  invasive_aspergillus,  pcp,  bartonella,  crypto_extrapul,  coccidioidomycosis,  histoplasmosis,  mucormycosis,  cmv_pneumonia,  cmv_pancreatitis,  other_severe_cmv,  cmv_other,  ebv,  hsv_encephalitis,  varicella_systemic,  zoster,  rsv,  hpn,  hhv6_7,  pml,  enteroviral_meningitis,  parvovirus,  babesia,  toxoplasma,  visceral_leishmaniasis,  acanthamoeba,  naegleriasis,  strongyloidiasis,  taenia,  flu,  nosocomial,  vap,  catheter_inf,  surgical_inf,  bloodstream_catheter,  c_diff)

NOTE: i took out hosp_region because it was redundant with region.

1.7 Tidied Tibble

Our tibble ms1718 contains 682968 rows (patients) and 93 columns (variables). Each variable is contained in a column, and each row represents a single key_nis. All variables now have appropriate types.

head(ms1718) %>% kable()

key_nis	year	ms	ms_f	oi	oi_f	discwt	age	sex	race	insurance	patient_loc	region	zipinc_qrtl	obese	htn	cad	hf	hld	pvd	dm	kidney	copd	asthma	osteoporosis	ra	fibromyalgia	glaucoma	depression	anxiety	bipolar	epilepsy	migraine	ibd	ibs	hashimoto	lupus	psoriasis	scd	hiv	ED_record	elective_admin	tran_in	aweekend	amonth	hosp_bedsize	hosp_locteach	h_contrl	oi_type	invasive_enterobacteriaceae	bacteremia_meningitis	bloodstream_catheter	c_diff
10190580	2017	1	yes	1	yes	5.000222	45	male	White	Medicare	Other	NewEngland	<48K	n	n	n	n	n	n	n	no	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	no	no	not_transferred	yes	1	small	rural	private_notprofit	bacteria	0	1	0	0
10026133	2017	1	yes	1	yes	5.000000	41	male	White	Medicare	Fringe	NewEngland	82K+	n	n	n	n	n	n	n	no	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	yes	no	not_transferred	no	12	large	urban_teaching	private_notprofit	bacteria	0	0	0	1
10031795	2017	1	yes	1	yes	5.000000	79	male	White	Medicare	Fringe	NewEngland	82K+	n	n	n	y	y	n	n	stage3	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	n	yes	no	acute_care	yes	7	large	urban_teaching	private_notprofit	bacteria	1	1	0	0
10057572	2017	1	yes	1	yes	5.000000	38	female	White	Medicare	metro>50K	NewEngland	61-82K	n	n	n	n	n	n	n	no	n	n	n	n	n	n	y	n	n	n	y	n	y	n	n	n	n	n	yes	no	not_transferred	no	3	large	urban_teaching	private_notprofit	hospital	0	0	1	0
10058311	2017	1	yes	0	no	5.000000	54	female	White	Medicare	Fringe	NewEngland	82K+	n	n	n	n	n	n	n	no	n	n	n	n	n	n	n	n	n	y	n	n	n	n	n	n	n	n	yes	no	not_transferred	no	2	large	urban_teaching	private_notprofit	none	0	0	0	0
10059466	2017	1	yes	0	no	5.000000	68	male	White	Medicare	Fringe	NewEngland	82K+	y	n	n	y	y	n	n	no	n	n	n	n	n	n	y	y	n	n	y	n	n	n	n	n	n	n	no	yes	not_transferred	no	12	large	urban_teaching	private_notprofit	none	0	0	0	0

I have also saved the tidied tibble as an R data set

saveRDS(ms1718, "ms1718.Rds")

ms1718 <- readRDS("ms1718.Rds")

2 Code Book and Clean Data Summary

Sample Size The data in our complete ms1718 sample consist of 682968 subjects from HCUP-NIS between the ages of 18 and 90.
Missingness Of the 682968 subjects, 645918 have complete data on all variables listed below.
Our outcome variables is oi.

oi is if the person had a diagnosis for an opportunistic infection: recurrent pneumonia, invasive group B strep, invasive_enterobacteriaceae, disseminated tuberculosis, bacteremia_meningitis, disseminated_bartonella, legionella, M_aviuum, severe candida, invasive_aspergillus, PCP, bartonella, extrapulmonary cryptococcus, coccidioidomycosis, Histoplasmosis, Mucormycosis, CMV_pneumonia, CMV_pancreatitis, other_severe_CMV, CMV_other, EBV, HSV_encephalitis, Varicella_systemic, zoster, RSV, HPN, HHV6_7, PML, Enteroviral_meningitis, Parvovirus, Babesia, toxoplasma, Visceral_leishmaniasis, Acanthamoeba, Naegleriasis, strongyloidiasis, Taenia, flu, nosocomial infections, VAP, catehter infections, surgical site infections, bloodstream catheter infections, c.diff

All other variables listed below will serve as candidate predictors for our models.
The other variable contained in my tidy tibble is key_nis which is the key_nis identifying code.

paste(colnames(ms1718), collapse = " | ")

[1] "key_nis | year | ms | ms_f | oi | oi_f | discwt | age | sex | race | insurance | patient_loc | region | zipinc_qrtl | obese | htn | cad | hf | hld | pvd | dm | kidney | copd | asthma | osteoporosis | ra | fibromyalgia | glaucoma | depression | anxiety | bipolar | epilepsy | migraine | ibd | ibs | hashimoto | lupus | psoriasis | scd | hiv | ED_record | elective_admin | tran_in | aweekend | amonth | hosp_bedsize | hosp_locteach | h_contrl | oi_type | rec_pneumonia | inv_gbs | invasive_enterobacteriaceae | dissem_tb | bacteremia_meningitis | disseminated_bartonella | legionella | m_aviuum | candida_severe | invasive_aspergillus | pcp | bartonella | crypto_extrapul | coccidioidomycosis | histoplasmosis | mucormycosis | cmv_pneumonia | cmv_pancreatitis | other_severe_cmv | cmv_other | ebv | hsv_encephalitis | varicella_systemic | zoster | rsv | hpn | hhv6_7 | pml | enteroviral_meningitis | parvovirus | babesia | toxoplasma | visceral_leishmaniasis | acanthamoeba | naegleriasis | strongyloidiasis | taenia | flu | nosocomial | vap | catheter_inf | surgical_inf | bloodstream_catheter | c_diff"

ms1718 %>% tabyl(tran_in)

         tran_in      n     percent valid_percent
 not_transferred 609181 0.891961263    0.89612457
      acute_care  43186 0.063232831    0.06352798
           other  27428 0.040160007    0.04034746
            <NA>   3173 0.004645898            NA

Variable	Type	Description
ms	binary	Presence of ICD-10 code G35 in record
age	quant	years
amonth	quant	months 1-12
aweekend	binary	whether the patient was admitted on a weekend (yes/no)
discwt	quant	discharge weight
race	6-cat	Black, White, Hispanic, Other, Asian, Native American
tran_in	3-cat	Indicator of a transfer into the hospital(Not transferred in, Transferred in from a different acute care hospital, Transferred in from another type of health facility)
zipinc_qrtl	4-cat	Median household income for patient’s ZIP Code (based on current year). Values include <48K, 48-61K, 61-82K, 82K+
hosp_bedsize	3-cat	small, medium, larg
hosp_locteach	3-cat	Teaching Status of the hospital: rural, urban_nonteaching, Urban_teaching
hosp_region	4-cat	Northeast Midwest South West
depression	binary	diagnosis of depression (presence of ICD-10 code F32 or F33)
htn	binary	diagnosis of hypertension (presence of ICD-10 code I10)
migraine	binary	diagnosis of migraine (presence of ICD-10 code G43)
hld	binary	diagnosis of hyperlipidemia (presence of ICD-10 code E78)
anxiety	binary	diagnosis of anxiety (presence of ICD-10 code F41)
copd	diagnosis of COPD (presence of ICD-10 code J44)
asthma	binary	diagnosis of asthma (presence of ICD-10 code J45)
ibs	binary	diagnosis of asthma (presence of ICD-10 code J45)
hashimoto	binary	diagnosis of hashimoto (autoimmune thyroiditis) (presence of ICD-10 code E063)
osteoporosis	binary	diagnosis of osteoporosis (presence of ICD-10 code M81)
ra	binary	diagnosis of Rheumatoid Arthritis (presence of ICD-10 code M06)
fibromyalgia	binary	diagnosis of fibromyalgia (presence of ICD-10 code (M797)
cad	binary	diagnosis of coronary artery disease (presence of ICD-10 code (I25)
ibd	binary	diagnosis of irritable bowel disease (presence of ICD-10 code (K51)
glaucoma	binary	diagnosis of glaucoma (presence of ICD-10 code (H40)
bipolar	binary	diagnosis of bipolar disorder (presence of ICD-10 code F31)
epilepsy	binary	diagnosis of epilepsy (presence of ICD-10 code G40)
pvd	binary	diagnosis of peripheral vascular disease (presence of ICD-10 code I73)
lupus	binary	diagnosis of lupus (presence of ICD-10 code M32)
psoriasis	binary	diagnosis of psoriasis (presence of ICD-10 code L40)
scd	binary	diagnosis of sickle cell disease (presence of ICD-10 code D57)
hiv	binary	diagnosis of hiv (presence of ICD-10 code B20)
hf	binary	diagnosis of heart failure (presence of ICD-10 code I50)
sex	binary	male, female.
insurance	5-cat	expected primary payer (Medicare, Medicaid, private insurance, self pay, other)
patient_loc	6-cat	Patient Location (“Central” counties of metro areas of >=1 million population, “Fringe” counties of metro areas of >=1 million population, Counties in metro areas of 250,000-999,999 population, Counties in metro areas of 50,000-249,999 population, Micropolitan counties, Not metropolitan or micropolitan counties)
region
ED_record	binary	records that have evidence of emergency department (ED) services reported on the HCUP record (yes/no)
elective_admin	binary	indicates whether the admission to the hospital was elective

3 Table 1

3.0.1 Table one

This table has the covariates that I will be adjusting for as I explore the relationship between ms and oi.

dput(names(ms1718))

c("key_nis", "year", "ms", "ms_f", "oi", "oi_f", "discwt", "age", 
"sex", "race", "insurance", "patient_loc", "region", "zipinc_qrtl", 
"obese", "htn", "cad", "hf", "hld", "pvd", "dm", "kidney", "copd", 
"asthma", "osteoporosis", "ra", "fibromyalgia", "glaucoma", "depression", 
"anxiety", "bipolar", "epilepsy", "migraine", "ibd", "ibs", "hashimoto", 
"lupus", "psoriasis", "scd", "hiv", "ED_record", "elective_admin", 
"tran_in", "aweekend", "amonth", "hosp_bedsize", "hosp_locteach", 
"h_contrl", "oi_type", "rec_pneumonia", "inv_gbs", "invasive_enterobacteriaceae", 
"dissem_tb", "bacteremia_meningitis", "disseminated_bartonella", 
"legionella", "m_aviuum", "candida_severe", "invasive_aspergillus", 
"pcp", "bartonella", "crypto_extrapul", "coccidioidomycosis", 
"histoplasmosis", "mucormycosis", "cmv_pneumonia", "cmv_pancreatitis", 
"other_severe_cmv", "cmv_other", "ebv", "hsv_encephalitis", "varicella_systemic", 
"zoster", "rsv", "hpn", "hhv6_7", "pml", "enteroviral_meningitis", 
"parvovirus", "babesia", "toxoplasma", "visceral_leishmaniasis", 
"acanthamoeba", "naegleriasis", "strongyloidiasis", "taenia", 
"flu", "nosocomial", "vap", "catheter_inf", "surgical_inf", "bloodstream_catheter", 
"c_diff")

This table has the covariates that I will be calculating propensity scores with.

vars <- c("age", "sex", "race", "insurance", "patient_loc", "region", "zipinc_qrtl", 
"obese", "htn", "cad", "hf", "hld", "pvd", "dm", "kidney", 
"copd", "asthma", "osteoporosis", "ra", "fibromyalgia", "glaucoma", 
"depression", "anxiety", "bipolar", "epilepsy", "migraine", "ibd", 
"ibs", "hashimoto", "lupus", "psoriasis", "scd", "hiv", "ED_record", 
"elective_admin", "tran_in", "aweekend", "amonth", "hosp_bedsize", 
"hosp_locteach",  "h_contrl")

factorvars <- c( "sex", "race", "insurance", "patient_loc", "region", "zipinc_qrtl", 
"obese", "htn", "cad", "hf", "hld", "pvd",  "kidney", 
"copd", "asthma", "osteoporosis", "ra", "fibromyalgia", "glaucoma", 
"depression", "anxiety", "bipolar", "epilepsy", "migraine", "ibd", 
"ibs", "hashimoto", "lupus", "psoriasis", "scd", "hiv", "ED_record", 
"elective_admin", "tran_in", "aweekend",  "hosp_bedsize", 
"hosp_locteach", "h_contrl")

trt <- c("ms_f")

table01 <- CreateTableOne(data = ms1718,  
                       vars = vars, 
                       factorVars= factorvars,
                       strata = trt)
print(table01, verbose=TRUE)

                          Stratified by ms_f
                           no             yes           p      test
  n                        620880         62088                    
  age (mean (SD))           58.08 (20.26) 56.66 (14.61) <0.001     
  sex = female (%)         357621 (57.6)  44868 (72.3)  <0.001     
  race (%)                                              <0.001     
     White                 404266 (67.1)  45319 (75.1)             
     Black                  91201 (15.1)   9732 (16.1)             
     Hispanic               68467 (11.4)   3492 ( 5.8)             
     Other                  18247 ( 3.0)   1256 ( 2.1)             
     Asian                  16618 ( 2.8)    360 ( 0.6)             
     NativeA                 3712 ( 0.6)    218 ( 0.4)             
  insurance (%)                                         <0.001     
     Medicare              296863 (47.9)  36928 (59.6)             
     Private               164550 (26.5)  14465 (23.3)             
     Medicaid              114577 (18.5)   8248 (13.3)             
     Self_pay               24575 ( 4.0)   1059 ( 1.7)             
     Other                  19387 ( 3.1)   1309 ( 2.1)             
  patient_loc (%)                                       <0.001     
     Central               182195 (29.5)  17582 (28.4)             
     Fringe                148787 (24.1)  16824 (27.2)             
     metro>250K            127713 (20.7)  12752 (20.6)             
     metro>50K              57799 ( 9.4)   5738 ( 9.3)             
     micro                  57452 ( 9.3)   5283 ( 8.5)             
     Other                  43725 ( 7.1)   3743 ( 6.0)             
  region (%)                                            <0.001     
     South_Atlantic        129882 (20.9)  11790 (19.0)             
     EastNorth_Central      95758 (15.4)  12146 (19.6)             
     Middle_Atlantic        86651 (14.0)   9927 (16.0)             
     Pacific                82325 (13.3)   6825 (11.0)             
     WestSouth_Central      72182 (11.6)   5189 ( 8.4)             
     WestNorth_Central      43295 ( 7.0)   4613 ( 7.4)             
     EastSouth_Central      43133 ( 6.9)   3697 ( 6.0)             
     Mountain               38346 ( 6.2)   4268 ( 6.9)             
     NewEngland             29308 ( 4.7)   3633 ( 5.9)             
  zipinc_qrtl (%)                                       <0.001     
     <48K                  183929 (30.2)  15828 (25.8)             
     48-61K                163060 (26.7)  16249 (26.5)             
     61-82K                144270 (23.7)  15544 (25.4)             
     82K+                  118723 (19.5)  13693 (22.3)             
  obese = y (%)            101297 (16.3)  10452 (16.8)   0.001     
  htn = y (%)              201284 (32.4)  23122 (37.2)  <0.001     
  cad = y (%)              126759 (20.4)   7950 (12.8)  <0.001     
  hf = y (%)               107298 (17.3)   6100 ( 9.8)  <0.001     
  hld = y (%)              204382 (32.9)  17724 (28.5)  <0.001     
  pvd = y (%)               17660 ( 2.8)   1583 ( 2.5)  <0.001     
  dm = y (%)               167089 (26.9)  13152 (21.2)  <0.001     
  kidney (%)                                            <0.001     
     CKDother               21891 ( 3.5)   1559 ( 2.5)             
     no                    516105 (83.1)  56152 (90.4)             
     stage1_2                5711 ( 0.9)    413 ( 0.7)             
     stage3                 42535 ( 6.9)   2512 ( 4.0)             
     stage4                 11727 ( 1.9)    561 ( 0.9)             
     stage5_ESR             22911 ( 3.7)    891 ( 1.4)             
  copd = y (%)              94370 (15.2)   7675 (12.4)  <0.001     
  asthma = y (%)            43863 ( 7.1)   5161 ( 8.3)  <0.001     
  osteoporosis = y (%)      17100 ( 2.8)   3125 ( 5.0)  <0.001     
  ra = y (%)                11486 ( 1.8)   1233 ( 2.0)   0.018     
  fibromyalgia = y (%)       9323 ( 1.5)   2637 ( 4.2)  <0.001     
  glaucoma = y (%)           9815 ( 1.6)    998 ( 1.6)   0.625     
  depression = y (%)        85454 (13.8)  15975 (25.7)  <0.001     
  anxiety = y (%)           85689 (13.8)  12746 (20.5)  <0.001     
  bipolar = y (%)           19969 ( 3.2)   2616 ( 4.2)  <0.001     
  epilepsy = y (%)          23038 ( 3.7)   5819 ( 9.4)  <0.001     
  migraine = y (%)          11547 ( 1.9)   3283 ( 5.3)  <0.001     
  ibd = y (%)                2523 ( 0.4)    256 ( 0.4)   0.850     
  ibs = y (%)                6348 ( 1.0)   1035 ( 1.7)  <0.001     
  hashimoto = y (%)           954 ( 0.2)    176 ( 0.3)  <0.001     
  lupus = y (%)              3731 ( 0.6)    712 ( 1.1)  <0.001     
  psoriasis = y (%)          3249 ( 0.5)    450 ( 0.7)  <0.001     
  scd = y (%)                3251 ( 0.5)    179 ( 0.3)  <0.001     
  hiv = y (%)                2456 ( 0.4)     80 ( 0.1)  <0.001     
  ED_record = yes (%)      374744 (60.4)  44384 (71.5)  <0.001     
  elective_admin = yes (%) 144804 (23.4)   9461 (15.3)  <0.001     
  tran_in (%)                                           <0.001     
     not_transferred       555216 (89.8)  53965 (87.3)             
     acute_care             39194 ( 6.3)   3992 ( 6.5)             
     other                  23580 ( 3.8)   3848 ( 6.2)             
  aweekend = yes (%)       128047 (20.6)  13602 (21.9)  <0.001     
  amonth (mean (SD))         6.48 (3.46)   6.46 (3.46)   0.125     
  hosp_bedsize (%)                                       0.081     
     small                 128612 (20.7)  13065 (21.0)             
     medium                182753 (29.4)  18329 (29.5)             
     large                 309515 (49.9)  30694 (49.4)             
  hosp_locteach (%)                                     <0.001     
     rural                  56708 ( 9.1)   5070 ( 8.2)             
     urban_nonteaching     135839 (21.9)  13266 (21.4)             
     urban_teaching        428333 (69.0)  43752 (70.5)             
  h_contrl (%)                                          <0.001     
     gov_nonfed             71258 (11.5)   6010 ( 9.7)             
     private_notprofit     456378 (73.5)  48394 (77.9)             
     Private_profit         93244 (15.0)   7684 (12.4)

4 Dealing with missingness

once again here is my missing. I tried imputation with mice and simputation. It is too much for R to handle, so I just have to do complete cases.

ms1718 %>% select(age, amonth, aweekend, discwt, key_nis, race, tran_in,  zipinc_qrtl, ms, hosp_bedsize, hosp_locteach, h_contrl, sex, insurance, patient_loc, region, ED_record, elective_admin) %>% 
gg_miss_var()

# set.seed(0527)
# ms1718_ms <- ms1718 %>% filter(ms_f == "yes") %>% 
# data.frame() %>%
#   impute_cart(., tran_in ~ .) %>% 
#   impute_cart(., elective_admin ~ .) %>% 
#   impute_cart(., sex ~ .) %>% 
#   impute_cart(., race ~ .) %>% 
#   impute_cart(., amonth ~ .) %>%
#   impute_cart(., zipinc_qrtl ~ .) %>%
#   impute_cart(., insurance ~ .) %>% 
#   impute_cart(., patient_loc ~ .) %>% 
#   tbl_df()

# set.seed(432432)
# ms1718_mice <- mice(ms1718, m = 1, printFlag = FALSE)

ms1718c <- ms1718 %>% filter(complete.cases(.))

Below I am just checking to make sure that I have no more missing

n_miss(ms1718c)

[1] 0

no more missing!

dim(ms1718c)

[1] 645918     93

The ms1718c data contain 645918 rows and 93 columns.

5 Unadjusted analysis

Epi::twoby2(table(ms1718c$ms_f, ms1718c$oi_f))  %>% kable(dig=3)

2 by 2 table analysis: 
------------------------------------------------------ 
Outcome   : no 
Comparing : no vs. yes 

        no   yes    P(no) 95% conf. interval
no  490138 96613   0.8353        NA       NA
yes  41433 17734   0.7003    0.6966    0.704

                                   95% conf. interval
             Relative Risk: 1.1929    1.1865   1.1993
         Sample Odds Ratio: 2.1714    2.1308   2.2128
    Probability difference: 0.1351    0.1313   0.1389
 
        Asymptotic P-value: 0.0000 
------------------------------------------------------

	no	yes	P(no)	95% conf.	interval
no	490138	96613	0.835	NA	NA
yes	41433	17734	0.700	0.697	0.704

		95% conf.	interval
Relative Risk:	1.193	1.186	1.199
Sample Odds Ratio:	2.171	2.131	2.213
Probability difference:	0.135	0.131	0.139

x
0

unadjust_binary_outcome <- glm(oi ~ ms_f, data = ms1718c, family = binomial())
unadjust_binary_outcome_tidy <- tidy(unadjust_binary_outcome, conf.int = TRUE, conf.level = 0.95, exponentiate = TRUE) %>% 
    filter(term == "ms_f")
unadjust_binary_outcome


Call:  glm(formula = oi ~ ms_f, family = binomial(), data = ms1718c)

Coefficients:
(Intercept)      ms_fyes  
    -1.6240       0.7754  

Degrees of Freedom: 645917 Total (i.e. Null);  645916 Residual
Null Deviance:      603100 
Residual Deviance: 597200   AIC: 597200

The odds of having an in MS individuals was (95%CI: , ) times higher than the odds that a non-MS control had an oi

6 Splitting the data

I am splitting the singly imputed ms1718c sample into a training (90% of the data) and testing sample (10% of the data). I am using the function strata to ensure that both data sets have an equal proportion of my main predictor of interest, ms, and the outcome, oi

set.seed(2)
ms_split <- rsample::initial_split(ms1718c, prop = 0.9,
strata = ms, oi)
ms_train <- rsample::training(ms_split) 
ms_test <- rsample::testing(ms_split)

dim(ms1718c)

[1] 645918     93

dim(ms_train)

[1] 581327     93

dim(ms_test)

[1] 64591    93

NOTE: I actually ended up building the model with the ‘test’ sample because the training sample was just too big (I even tried building it with a training sample that was 70% of the size)

7 logistic regression

7.1 Model 1

7.1.1 Fitting Model 1

paste(colnames(ms1718c), collapse = " + ")

[1] "key_nis + year + ms + ms_f + oi + oi_f + discwt + age + sex + race + insurance + patient_loc + region + zipinc_qrtl + obese + htn + cad + hf + hld + pvd + dm + kidney + copd + asthma + osteoporosis + ra + fibromyalgia + glaucoma + depression + anxiety + bipolar + epilepsy + migraine + ibd + ibs + hashimoto + lupus + psoriasis + scd + hiv + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + oi_type + rec_pneumonia + inv_gbs + invasive_enterobacteriaceae + dissem_tb + bacteremia_meningitis + disseminated_bartonella + legionella + m_aviuum + candida_severe + invasive_aspergillus + pcp + bartonella + crypto_extrapul + coccidioidomycosis + histoplasmosis + mucormycosis + cmv_pneumonia + cmv_pancreatitis + other_severe_cmv + cmv_other + ebv + hsv_encephalitis + varicella_systemic + zoster + rsv + hpn + hhv6_7 + pml + enteroviral_meningitis + parvovirus + babesia + toxoplasma + visceral_leishmaniasis + acanthamoeba + naegleriasis + strongyloidiasis + taenia + flu + nosocomial + vap + catheter_inf + surgical_inf + bloodstream_catheter + c_diff"

I first tried:

mod1predicts the log odds of oi using the predictors year, age, sex, race, insurance, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, pvd, dm, kidney, copd, asthma, osteoporosis, ra, fibromyalgia, glaucoma, depression, anxiety, bipolar, epilepsy, migraine, ibd, ibs, hashimoto, lupus, psoriasis, scd, hiv, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl

But then this was too much for it to handle, so I simplified it to only contain:

mod1_glm <- with(ms_test, 
                glm(oi ~ ms_f + age + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl,
                    family = binomial))

z <- datadist(ms_test) 
options(datadist = "z")

mod1_lrm <- lrm(oi ~ ms_f + age + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl, data=ms_test, x = TRUE, y = TRUE)

7.1.2 Effect sizes model 1

plot(summary(mod1_lrm))

summary(mod1_lrm)

             Effects              Response : oi 

 Factor                                           Low High Diff. Effect    
 age                                              42  74   32     0.4901400
  Odds Ratio                                      42  74   32     1.6325000
 amonth                                            3  10    7    -0.1112600
  Odds Ratio                                       3  10    7     0.8947100
 ms_f - yes:no                                     1   2   NA     0.7501800
  Odds Ratio                                       1   2   NA     2.1174000
 sex - male:female                                 2   1   NA     0.0865680
  Odds Ratio                                       2   1   NA     1.0904000
 race - Black:White                                1   2   NA    -0.0751920
  Odds Ratio                                       1   2   NA     0.9275700
 race - Hispanic:White                             1   3   NA     0.0229700
  Odds Ratio                                       1   3   NA     1.0232000
 race - Other:White                                1   4   NA     0.0183500
  Odds Ratio                                       1   4   NA     1.0185000
 race - Asian:White                                1   5   NA    -0.0495640
  Odds Ratio                                       1   5   NA     0.9516400
 race - NativeA:White                              1   6   NA     0.2041200
  Odds Ratio                                       1   6   NA     1.2264000
 insurance - Private:Medicare                      1   2   NA    -0.2592100
  Odds Ratio                                       1   2   NA     0.7716600
 insurance - Medicaid:Medicare                     1   3   NA    -0.1801700
  Odds Ratio                                       1   3   NA     0.8351300
 insurance - Self_pay:Medicare                     1   4   NA    -0.2475000
  Odds Ratio                                       1   4   NA     0.7807500
 insurance - Other:Medicare                        1   5   NA    -0.2250100
  Odds Ratio                                       1   5   NA     0.7985000
 patient_loc - Fringe:Central                      1   2   NA    -0.0031994
  Odds Ratio                                       1   2   NA     0.9968100
 patient_loc - metro>250K:Central                  1   3   NA     0.0436060
  Odds Ratio                                       1   3   NA     1.0446000
 patient_loc - metro>50K:Central                   1   4   NA     0.0330390
  Odds Ratio                                       1   4   NA     1.0336000
 patient_loc - micro:Central                       1   5   NA     0.0681130
  Odds Ratio                                       1   5   NA     1.0705000
 patient_loc - Other:Central                       1   6   NA     0.0642700
  Odds Ratio                                       1   6   NA     1.0664000
 region - EastNorth_Central:South_Atlantic         1   2   NA     0.0077239
  Odds Ratio                                       1   2   NA     1.0078000
 region - Middle_Atlantic:South_Atlantic           1   3   NA    -0.1483400
  Odds Ratio                                       1   3   NA     0.8621400
 region - Pacific:South_Atlantic                   1   4   NA     0.1482900
  Odds Ratio                                       1   4   NA     1.1598000
 region - WestSouth_Central:South_Atlantic         1   5   NA     0.1579500
  Odds Ratio                                       1   5   NA     1.1711000
 region - WestNorth_Central:South_Atlantic         1   6   NA    -0.0111580
  Odds Ratio                                       1   6   NA     0.9889000
 region - EastSouth_Central:South_Atlantic         1   7   NA     0.0389670
  Odds Ratio                                       1   7   NA     1.0397000
 region - Mountain:South_Atlantic                  1   8   NA     0.0580440
  Odds Ratio                                       1   8   NA     1.0598000
 region - NewEngland:South_Atlantic                1   9   NA    -0.0515370
  Odds Ratio                                       1   9   NA     0.9497700
 zipinc_qrtl - 48-61K:<48K                         1   2   NA    -0.0062378
  Odds Ratio                                       1   2   NA     0.9937800
 zipinc_qrtl - 61-82K:<48K                         1   3   NA    -0.0069961
  Odds Ratio                                       1   3   NA     0.9930300
 zipinc_qrtl - 82K+:<48K                           1   4   NA    -0.0345080
  Odds Ratio                                       1   4   NA     0.9660800
 ED_record - no:yes                                2   1   NA    -0.6351200
  Odds Ratio                                       2   1   NA     0.5298700
 elective_admin - yes:no                           1   2   NA    -1.0106000
  Odds Ratio                                       1   2   NA     0.3639900
 tran_in - acute_care:not_transferred              1   2   NA     0.4042000
  Odds Ratio                                       1   2   NA     1.4981000
 tran_in - other:not_transferred                   1   3   NA     0.5856900
  Odds Ratio                                       1   3   NA     1.7962000
 aweekend - yes:no                                 1   2   NA     0.0925620
  Odds Ratio                                       1   2   NA     1.0970000
 hosp_bedsize - small:large                        3   1   NA     0.0684640
  Odds Ratio                                       3   1   NA     1.0709000
 hosp_bedsize - medium:large                       3   2   NA     0.0528850
  Odds Ratio                                       3   2   NA     1.0543000
 hosp_locteach - rural:urban_teaching              3   1   NA     0.0704920
  Odds Ratio                                       3   1   NA     1.0730000
 hosp_locteach - urban_nonteaching:urban_teaching  3   2   NA     0.0048300
  Odds Ratio                                       3   2   NA     1.0048000
 h_contrl - gov_nonfed:private_notprofit           2   1   NA    -0.0804590
  Odds Ratio                                       2   1   NA     0.9226900
 h_contrl - Private_profit:private_notprofit       2   3   NA    -0.1517900
  Odds Ratio                                       2   3   NA     0.8591700
 S.E.     Lower 0.95 Upper 0.95
 0.025417  0.440320   0.5399600
       NA  1.553200   1.7159000
 0.021402 -0.153210  -0.0693130
       NA  0.857950   0.9330300
 0.032694  0.686100   0.8142600
       NA  1.986000   2.2575000
 0.021660  0.044116   0.1290200
       NA  1.045100   1.1377000
 0.033278 -0.140410  -0.0099689
       NA  0.869000   0.9900800
 0.039054 -0.053574   0.0995140
       NA  0.947840   1.1046000
 0.068266 -0.115450   0.1521500
       NA  0.890970   1.1643000
 0.074132 -0.194860   0.0957320
       NA  0.822950   1.1005000
 0.141010 -0.072260   0.4804900
       NA  0.930290   1.6169000
 0.033010 -0.323910  -0.1945100
       NA  0.723320   0.8232400
 0.038839 -0.256290  -0.1040400
       NA  0.773920   0.9011900
 0.065481 -0.375840  -0.1191600
       NA  0.686710   0.8876600
 0.070722 -0.363630  -0.0864010
       NA  0.695150   0.9172300
 0.031291 -0.064529   0.0581300
       NA  0.937510   1.0599000
 0.031915 -0.018947   0.1061600
       NA  0.981230   1.1120000
 0.041804 -0.048895   0.1149700
       NA  0.952280   1.1218000
 0.052767 -0.035309   0.1715300
       NA  0.965310   1.1871000
 0.054242 -0.042042   0.1705800
       NA  0.958830   1.1860000
 0.036406 -0.063631   0.0790790
       NA  0.938350   1.0823000
 0.038383 -0.223570  -0.0731100
       NA  0.799660   0.9295000
 0.039743  0.070395   0.2261800
       NA  1.072900   1.2538000
 0.040250  0.079061   0.2368400
       NA  1.082300   1.2672000
 0.049720 -0.108610   0.0862930
       NA  0.897080   1.0901000
 0.046897 -0.052950   0.1308800
       NA  0.948430   1.1398000
 0.050926 -0.041770   0.1578600
       NA  0.959090   1.1710000
 0.054508 -0.158370   0.0552970
       NA  0.853530   1.0569000
 0.029069 -0.063212   0.0507360
       NA  0.938740   1.0520000
 0.031671 -0.069070   0.0550770
       NA  0.933260   1.0566000
 0.035980 -0.105030   0.0360110
       NA  0.900300   1.0367000
 0.034337 -0.702420  -0.5678200
       NA  0.495390   0.5667600
 0.047958 -1.104600  -0.9166500
       NA  0.331330   0.3998600
 0.048278  0.309580   0.4988200
       NA  1.362800   1.6468000
 0.048301  0.491030   0.6803600
       NA  1.634000   1.9746000
 0.024865  0.043828   0.1413000
       NA  1.044800   1.1518000
 0.028562  0.012483   0.1244500
       NA  1.012600   1.1325000
 0.025283  0.003332   0.1024400
       NA  1.003300   1.1079000
 0.051022 -0.029510   0.1704900
       NA  0.970920   1.1859000
 0.027152 -0.048387   0.0580470
       NA  0.952770   1.0598000
 0.035963 -0.150950  -0.0099720
       NA  0.859890   0.9900800
 0.032864 -0.216200  -0.0873780
       NA  0.805570   0.9163300

7.2 Evaluating predictive punch

We can see from the spearman rho plot below that the most predictive punch is in ED_record, elective_admin, age, insurance, ms_f, kidney, dm

spear_mod1 <- spearman2(oi ~ ms_f + age + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + hf + kidney + dm + copd + ibd + obese + hiv + scd, 
                  data = ms1718c)

plot(spear_mod1)

obese, hiv, scd don’t seem to help us out that much

Possibilities for nonlinear terms:

interaction between ED_record and elective_admin
interaction between ED_record and age
restricted cubic spline with 4 knots on age

7.3 model 2

I’m going to add in some of the predictors that seemed helpful

mod2_lrm <- lrm(oi ~ ms_f + age + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + hf + kidney + dm + copd + ibd, data=ms_test, x = TRUE, y = TRUE)

mod2_lrm

Logistic Regression Model
 
 lrm(formula = oi ~ ms_f + age + sex + race + insurance + patient_loc + 
     region + zipinc_qrtl + ED_record + elective_admin + tran_in + 
     aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + 
     hf + kidney + dm + copd + ibd, data = ms_test, x = TRUE, 
     y = TRUE)
 
                        Model Likelihood    Discrimination    Rank Discrim.    
                              Ratio Test           Indexes          Indexes    
 Obs         64591    LR chi2    5451.60    R2       0.132    C       0.703    
  0          52879    d.f.            49    g        1.022    Dxy     0.405    
  1          11712    Pr(> chi2) <0.0001    gr       2.780    gamma   0.405    
 max |deriv| 6e-11                          gp       0.124    tau-a   0.120    
                                            Brier    0.137                     
 
                                 Coef    S.E.   Wald Z Pr(>|Z|)
 Intercept                       -2.4439 0.1148 -21.29 <0.0001 
 ms_f=yes                         0.7908 0.0329  24.00 <0.0001 
 age                              0.0142 0.0008  17.28 <0.0001 
 sex=female                      -0.0652 0.0218  -2.99 0.0028  
 race=Black                      -0.1330 0.0338  -3.94 <0.0001 
 race=Hispanic                   -0.0157 0.0394  -0.40 0.6896  
 race=Other                      -0.0042 0.0685  -0.06 0.9509  
 race=Asian                      -0.0908 0.0746  -1.22 0.2237  
 race=NativeA                     0.1615 0.1414   1.14 0.2535  
 insurance=Private               -0.2138 0.0332  -6.44 <0.0001 
 insurance=Medicaid              -0.1370 0.0389  -3.52 0.0004  
 insurance=Self_pay              -0.1787 0.0658  -2.72 0.0066  
 insurance=Other                 -0.1770 0.0709  -2.50 0.0126  
 patient_loc=Fringe              -0.0093 0.0314  -0.30 0.7665  
 patient_loc=metro>250K           0.0374 0.0320   1.17 0.2421  
 patient_loc=metro>50K            0.0305 0.0419   0.73 0.4674  
 patient_loc=micro                0.0578 0.0529   1.09 0.2744  
 patient_loc=Other                0.0536 0.0543   0.99 0.3238  
 region=EastNorth_Central        -0.0060 0.0365  -0.17 0.8689  
 region=Middle_Atlantic          -0.1446 0.0385  -3.76 0.0002  
 region=Pacific                   0.1440 0.0398   3.61 0.0003  
 region=WestSouth_Central         0.1532 0.0404   3.80 0.0001  
 region=WestNorth_Central        -0.0130 0.0498  -0.26 0.7935  
 region=EastSouth_Central         0.0345 0.0470   0.73 0.4627  
 region=Mountain                  0.0660 0.0510   1.29 0.1959  
 region=NewEngland               -0.0459 0.0546  -0.84 0.4008  
 zipinc_qrtl=48-61K              -0.0020 0.0291  -0.07 0.9441  
 zipinc_qrtl=61-82K               0.0050 0.0318   0.16 0.8742  
 zipinc_qrtl=82K+                -0.0109 0.0362  -0.30 0.7621  
 ED_record=yes                    0.6166 0.0344  17.90 <0.0001 
 elective_admin=yes              -0.9886 0.0481 -20.56 <0.0001 
 tran_in=acute_care               0.3925 0.0484   8.11 <0.0001 
 tran_in=other                    0.5756 0.0484  11.89 <0.0001 
 aweekend=yes                     0.0962 0.0249   3.86 0.0001  
 amonth                          -0.0157 0.0031  -5.12 <0.0001 
 hosp_bedsize=medium             -0.0183 0.0305  -0.60 0.5485  
 hosp_bedsize=large              -0.0699 0.0286  -2.44 0.0146  
 hosp_locteach=urban_nonteaching -0.0772 0.0547  -1.41 0.1584  
 hosp_locteach=urban_teaching    -0.0807 0.0512  -1.58 0.1149  
 h_contrl=private_notprofit       0.0716 0.0360   1.99 0.0468  
 h_contrl=Private_profit         -0.0764 0.0446  -1.71 0.0868  
 hf=y                            -0.0163 0.0284  -0.57 0.5657  
 kidney=no                       -0.2357 0.0512  -4.60 <0.0001 
 kidney=stage1_2                 -0.0810 0.1071  -0.76 0.4495  
 kidney=stage3                   -0.0542 0.0603  -0.90 0.3690  
 kidney=stage4                   -0.1815 0.0844  -2.15 0.0315  
 kidney=stage5_ESR                0.1973 0.0686   2.88 0.0040  
 dm=y                             0.1889 0.0238   7.93 <0.0001 
 copd=y                           0.0872 0.0283   3.08 0.0021  
 ibd=y                            0.4656 0.1501   3.10 0.0019

7.3.1 Effect sizes model 2

plot(summary(mod2_lrm))

7.3.2 model 3

mod3_lrm <- lrm(oi ~ ms_f + rcs(age,4) + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin +  ED_record*elective_admin +  age%ia%ED_record + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + hf + kidney + age%ia%kidney + dm + age%ia%dm +  copd + age%ia%copd + ibd, data=ms_test, x = TRUE, y = TRUE)

plot(summary(mod3_lrm))

mod3_lrm

Logistic Regression Model
 
 lrm(formula = oi ~ ms_f + rcs(age, 4) + sex + race + insurance + 
     patient_loc + region + zipinc_qrtl + ED_record + elective_admin + 
     ED_record * elective_admin + age %ia% ED_record + tran_in + 
     aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + 
     hf + kidney + age %ia% kidney + dm + age %ia% dm + copd + 
     age %ia% copd + ibd, data = ms_test, x = TRUE, y = TRUE)
 
                        Model Likelihood    Discrimination    Rank Discrim.    
                              Ratio Test           Indexes          Indexes    
 Obs         64591    LR chi2    5692.69    R2       0.138    C       0.704    
  0          52879    d.f.            60    g        1.114    Dxy     0.409    
  1          11712    Pr(> chi2) <0.0001    gr       3.046    gamma   0.409    
 max |deriv| 6e-11                          gp       0.124    tau-a   0.121    
                                            Brier    0.137                     
 
                                    Coef    S.E.   Wald Z Pr(>|Z|)
 Intercept                          -3.2122 0.3075 -10.45 <0.0001 
 ms_f=yes                            0.7273 0.0337  21.56 <0.0001 
 age                                 0.0351 0.0049   7.15 <0.0001 
 age'                               -0.0275 0.0065  -4.24 <0.0001 
 age''                               0.0926 0.0281   3.30 0.0010  
 sex=female                         -0.0276 0.0220  -1.26 0.2092  
 race=Black                         -0.1422 0.0339  -4.20 <0.0001 
 race=Hispanic                      -0.0007 0.0396  -0.02 0.9859  
 race=Other                          0.0088 0.0686   0.13 0.8979  
 race=Asian                         -0.0596 0.0747  -0.80 0.4251  
 race=NativeA                        0.1595 0.1420   1.12 0.2612  
 insurance=Private                  -0.2345 0.0338  -6.94 <0.0001 
 insurance=Medicaid                 -0.1538 0.0394  -3.90 <0.0001 
 insurance=Self_pay                 -0.2261 0.0666  -3.40 0.0007  
 insurance=Other                    -0.1977 0.0713  -2.77 0.0055  
 patient_loc=Fringe                 -0.0136 0.0314  -0.43 0.6644  
 patient_loc=metro>250K              0.0396 0.0320   1.24 0.2163  
 patient_loc=metro>50K               0.0350 0.0419   0.83 0.4039  
 patient_loc=micro                   0.0579 0.0529   1.09 0.2737  
 patient_loc=Other                   0.0358 0.0545   0.66 0.5113  
 region=EastNorth_Central           -0.0164 0.0366  -0.45 0.6546  
 region=Middle_Atlantic             -0.1436 0.0385  -3.73 0.0002  
 region=Pacific                      0.1453 0.0399   3.65 0.0003  
 region=WestSouth_Central            0.1449 0.0404   3.59 0.0003  
 region=WestNorth_Central           -0.0191 0.0498  -0.38 0.7012  
 region=EastSouth_Central            0.0383 0.0470   0.81 0.4156  
 region=Mountain                     0.0614 0.0511   1.20 0.2289  
 region=NewEngland                  -0.0440 0.0545  -0.81 0.4202  
 zipinc_qrtl=48-61K                  0.0028 0.0292   0.09 0.9247  
 zipinc_qrtl=61-82K                  0.0177 0.0318   0.56 0.5775  
 zipinc_qrtl=82K+                    0.0121 0.0362   0.33 0.7391  
 ED_record=yes                       1.4181 0.1005  14.11 <0.0001 
 elective_admin=yes                 -1.1104 0.0519 -21.39 <0.0001 
 age * ED_record=yes                -0.0142 0.0015  -9.36 <0.0001 
 tran_in=acute_care                  0.3143 0.0490   6.41 <0.0001 
 tran_in=other                       0.5691 0.0483  11.77 <0.0001 
 aweekend=yes                        0.1016 0.0249   4.07 <0.0001 
 amonth                             -0.0157 0.0031  -5.12 <0.0001 
 hosp_bedsize=medium                -0.0130 0.0305  -0.42 0.6710  
 hosp_bedsize=large                 -0.0666 0.0286  -2.32 0.0201  
 hosp_locteach=urban_nonteaching    -0.0700 0.0548  -1.28 0.2014  
 hosp_locteach=urban_teaching       -0.0737 0.0513  -1.44 0.1505  
 h_contrl=private_notprofit          0.0833 0.0361   2.31 0.0210  
 h_contrl=Private_profit            -0.0741 0.0446  -1.66 0.0968  
 hf=y                               -0.0020 0.0282  -0.07 0.9449  
 kidney=no                          -0.7654 0.2627  -2.91 0.0036  
 kidney=stage1_2                    -0.1828 0.5679  -0.32 0.7476  
 kidney=stage3                       0.4818 0.3293   1.46 0.1435  
 kidney=stage4                      -0.6040 0.4701  -1.28 0.1989  
 kidney=stage5_ESR                   0.1099 0.3346   0.33 0.7425  
 age * kidney=no                     0.0074 0.0036   2.05 0.0408  
 age * kidney=stage1_2               0.0014 0.0079   0.17 0.8622  
 age * kidney=stage3                -0.0069 0.0045  -1.53 0.1253  
 age * kidney=stage4                 0.0061 0.0063   0.97 0.3343  
 age * kidney=stage5_ESR             0.0001 0.0048   0.01 0.9914  
 dm=y                                0.3776 0.1068   3.54 0.0004  
 age * dm=y                         -0.0034 0.0016  -2.19 0.0286  
 copd=y                              0.2415 0.1579   1.53 0.1262  
 age * copd=y                       -0.0026 0.0022  -1.15 0.2495  
 ibd=y                               0.4527 0.1502   3.01 0.0026  
 ED_record=yes * elective_admin=yes  0.6610 0.1354   4.88 <0.0001

validate(mod3_lrm)

          index.orig training    test optimism index.corrected  n
Dxy           0.4090   0.4121  0.4070   0.0051          0.4039 40
R2            0.1378   0.1390  0.1365   0.0025          0.1354 40
Intercept     0.0000   0.0000 -0.0167   0.0167         -0.0167 40
Slope         1.0000   1.0000  0.9863   0.0137          0.9863 40
Emax          0.0000   0.0000  0.0060   0.0060          0.0060 40
D             0.0881   0.0889  0.0872   0.0016          0.0865 40
U             0.0000   0.0000  0.0000   0.0000          0.0000 40
Q             0.0881   0.0889  0.0872   0.0017          0.0865 40
B             0.1369   0.1367  0.1371  -0.0004          0.1373 40
g             1.1138   1.1181  1.1029   0.0153          1.0986 40
gp            0.1239   0.1245  0.1234   0.0011          0.1228 40

7.3.3 tidied coefficients

Among hospitalized patients in 2017-2018 (N=64,591), after adjusting for age + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl,mod1 predicts that the odds of having an OI in those with MS is 2.11 (95% CI 1.98, 2.26) times those without MS

given that the 95% CI is entirely above 1, the model suggests that having a MS is associated with a higher odds of an oi.

tidy(mod1_glm, exponentiate=TRUE) %>% kable(digits=3)

term	estimate	std.error	statistic	p.value
(Intercept)	0.072	0.102	-25.856	0.000
ms_fyes	2.117	0.033	22.946	0.000
age	1.015	0.001	19.284	0.000
sexfemale	0.917	0.022	-3.997	0.000
raceBlack	0.928	0.033	-2.260	0.024
raceHispanic	1.023	0.039	0.588	0.556
raceOther	1.019	0.068	0.269	0.788
raceAsian	0.952	0.074	-0.669	0.504
raceNativeA	1.226	0.141	1.448	0.148
insurancePrivate	0.772	0.033	-7.853	0.000
insuranceMedicaid	0.835	0.039	-4.639	0.000
insuranceSelf_pay	0.781	0.065	-3.780	0.000
insuranceOther	0.799	0.071	-3.182	0.001
patient_locFringe	0.997	0.031	-0.102	0.919
patient_locmetro>250K	1.045	0.032	1.366	0.172
patient_locmetro>50K	1.034	0.042	0.790	0.429
patient_locmicro	1.070	0.053	1.291	0.197
patient_locOther	1.066	0.054	1.185	0.236
regionEastNorth_Central	1.008	0.036	0.212	0.832
regionMiddle_Atlantic	0.862	0.038	-3.865	0.000
regionPacific	1.160	0.040	3.731	0.000
regionWestSouth_Central	1.171	0.040	3.924	0.000
regionWestNorth_Central	0.989	0.050	-0.224	0.822
regionEastSouth_Central	1.040	0.047	0.831	0.406
regionMountain	1.060	0.051	1.140	0.254
regionNewEngland	0.950	0.055	-0.945	0.344
zipinc_qrtl48-61K	0.994	0.029	-0.215	0.830
zipinc_qrtl61-82K	0.993	0.032	-0.221	0.825
zipinc_qrtl82K+	0.966	0.036	-0.959	0.338
ED_recordyes	1.887	0.034	18.497	0.000
elective_adminyes	0.364	0.048	-21.074	0.000
tran_inacute_care	1.498	0.048	8.372	0.000
tran_inother	1.796	0.048	12.126	0.000
aweekendyes	1.097	0.025	3.723	0.000
amonth	0.984	0.003	-5.199	0.000
hosp_bedsizemedium	0.985	0.030	-0.512	0.609
hosp_bedsizelarge	0.934	0.029	-2.397	0.017
hosp_locteachurban_nonteaching	0.936	0.055	-1.203	0.229
hosp_locteachurban_teaching	0.932	0.051	-1.382	0.167
h_contrlprivate_notprofit	1.084	0.036	2.237	0.025
h_contrlPrivate_profit	0.931	0.045	-1.602	0.109

exp(0.7502 + (1.96*0.0327))

[1] 2.257577

exp(0.7502 - (1.96*0.0327))

[1] 1.985971

7.3.4 mod2

The OR for ms_f with mod2 (when we add in hf + kidney + dm + copd + ibd), is considerably lower

mod2_glm <- glm(oi ~ ms_f + age + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + hf + kidney + dm + copd + ibd, data=ms_test)

tidy(mod2_glm, exponentiate=TRUE) %>% kable(digits=3)

term	estimate	std.error	statistic	p.value
(Intercept)	1.099	0.016	5.909	0.000
ms_fyes	1.132	0.005	23.628	0.000
age	1.002	0.000	14.926	0.000
sexfemale	0.999	0.003	-0.484	0.628
raceBlack	0.981	0.005	-4.331	0.000
raceHispanic	0.999	0.005	-0.186	0.852
raceOther	1.000	0.009	0.037	0.970
raceAsian	0.993	0.010	-0.697	0.486
raceNativeA	1.019	0.020	0.926	0.354
insurancePrivate	0.971	0.004	-6.614	0.000
insuranceMedicaid	0.978	0.005	-4.253	0.000
insuranceSelf_pay	0.965	0.009	-4.123	0.000
insuranceOther	0.973	0.009	-3.049	0.002
patient_locFringe	0.999	0.004	-0.285	0.776
patient_locmetro>250K	1.006	0.004	1.291	0.197
patient_locmetro>50K	1.005	0.006	0.829	0.407
patient_locmicro	1.008	0.007	1.137	0.255
patient_locOther	1.005	0.007	0.649	0.517
regionEastNorth_Central	0.998	0.005	-0.305	0.760
regionMiddle_Atlantic	0.981	0.005	-3.612	0.000
regionPacific	1.021	0.006	3.730	0.000
regionWestSouth_Central	1.021	0.006	3.738	0.000
regionWestNorth_Central	0.999	0.007	-0.158	0.874
regionEastSouth_Central	1.006	0.006	0.909	0.363
regionMountain	1.009	0.007	1.241	0.215
regionNewEngland	0.995	0.008	-0.675	0.500
zipinc_qrtl48-61K	1.000	0.004	0.038	0.970
zipinc_qrtl61-82K	1.002	0.004	0.428	0.669
zipinc_qrtl82K+	1.001	0.005	0.206	0.837
ED_recordyes	1.084	0.004	18.639	0.000
elective_adminyes	0.927	0.005	-15.854	0.000
tran_inacute_care	1.044	0.007	6.497	0.000
tran_inother	1.097	0.008	12.123	0.000
aweekendyes	1.016	0.004	4.445	0.000
amonth	0.998	0.000	-5.192	0.000
hosp_bedsizemedium	0.999	0.004	-0.291	0.771
hosp_bedsizelarge	0.991	0.004	-2.237	0.025
hosp_locteachurban_nonteaching	0.989	0.007	-1.518	0.129
hosp_locteachurban_teaching	0.988	0.007	-1.767	0.077
h_contrlprivate_notprofit	1.011	0.005	2.250	0.024
h_contrlPrivate_profit	0.989	0.006	-1.899	0.058
hfy	1.001	0.004	0.242	0.809
kidneyno	0.959	0.008	-5.156	0.000
kidneystage1_2	0.985	0.017	-0.913	0.361
kidneystage3	0.993	0.010	-0.694	0.487
kidneystage4	0.971	0.013	-2.200	0.028
kidneystage5_ESR	1.031	0.011	2.797	0.005
dmy	1.026	0.004	7.200	0.000
copdy	1.013	0.004	2.961	0.003
ibdy	1.073	0.023	3.014	0.003

7.3.5 key fit summary statistics

Below are the key fit summary statistics like the Nagelkerke R-square and the area under the ROC curve as they are presented in the lrm output for mod1

The r square is very low (0.127) as well as the C statistic (0.698).

mod1_lrm

Logistic Regression Model
 
 lrm(formula = oi ~ ms_f + age + sex + race + insurance + patient_loc + 
     region + zipinc_qrtl + ED_record + elective_admin + tran_in + 
     aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl, 
     data = ms_test, x = TRUE, y = TRUE)
 
                        Model Likelihood    Discrimination    Rank Discrim.    
                              Ratio Test           Indexes          Indexes    
 Obs         64591    LR chi2    5236.53    R2       0.127    C       0.698    
  0          52879    d.f.            40    g        1.003    Dxy     0.395    
  1          11712    Pr(> chi2) <0.0001    gr       2.725    gamma   0.396    
 max |deriv| 3e-12                          gp       0.121    tau-a   0.117    
                                            Brier    0.138                     
 
                                 Coef    S.E.   Wald Z Pr(>|Z|)
 Intercept                       -2.6341 0.1019 -25.86 <0.0001 
 ms_f=yes                         0.7502 0.0327  22.95 <0.0001 
 age                              0.0153 0.0008  19.28 <0.0001 
 sex=female                      -0.0866 0.0217  -4.00 <0.0001 
 race=Black                      -0.0752 0.0333  -2.26 0.0239  
 race=Hispanic                    0.0230 0.0391   0.59 0.5564  
 race=Other                       0.0184 0.0683   0.27 0.7881  
 race=Asian                      -0.0496 0.0741  -0.67 0.5038  
 race=NativeA                     0.2041 0.1410   1.45 0.1478  
 insurance=Private               -0.2592 0.0330  -7.85 <0.0001 
 insurance=Medicaid              -0.1802 0.0388  -4.64 <0.0001 
 insurance=Self_pay              -0.2475 0.0655  -3.78 0.0002  
 insurance=Other                 -0.2250 0.0707  -3.18 0.0015  
 patient_loc=Fringe              -0.0032 0.0313  -0.10 0.9186  
 patient_loc=metro>250K           0.0436 0.0319   1.37 0.1718  
 patient_loc=metro>50K            0.0330 0.0418   0.79 0.4293  
 patient_loc=micro                0.0681 0.0528   1.29 0.1968  
 patient_loc=Other                0.0643 0.0542   1.18 0.2361  
 region=EastNorth_Central         0.0077 0.0364   0.21 0.8320  
 region=Middle_Atlantic          -0.1483 0.0384  -3.86 0.0001  
 region=Pacific                   0.1483 0.0397   3.73 0.0002  
 region=WestSouth_Central         0.1579 0.0402   3.92 <0.0001 
 region=WestNorth_Central        -0.0112 0.0497  -0.22 0.8224  
 region=EastSouth_Central         0.0390 0.0469   0.83 0.4060  
 region=Mountain                  0.0580 0.0509   1.14 0.2544  
 region=NewEngland               -0.0515 0.0545  -0.95 0.3444  
 zipinc_qrtl=48-61K              -0.0062 0.0291  -0.21 0.8301  
 zipinc_qrtl=61-82K              -0.0070 0.0317  -0.22 0.8252  
 zipinc_qrtl=82K+                -0.0345 0.0360  -0.96 0.3375  
 ED_record=yes                    0.6351 0.0343  18.50 <0.0001 
 elective_admin=yes              -1.0106 0.0480 -21.07 <0.0001 
 tran_in=acute_care               0.4042 0.0483   8.37 <0.0001 
 tran_in=other                    0.5857 0.0483  12.13 <0.0001 
 aweekend=yes                     0.0926 0.0249   3.72 0.0002  
 amonth                          -0.0159 0.0031  -5.20 <0.0001 
 hosp_bedsize=medium             -0.0156 0.0305  -0.51 0.6090  
 hosp_bedsize=large              -0.0685 0.0286  -2.40 0.0165  
 hosp_locteach=urban_nonteaching -0.0657 0.0546  -1.20 0.2289  
 hosp_locteach=urban_teaching    -0.0705 0.0510  -1.38 0.1671  
 h_contrl=private_notprofit       0.0805 0.0360   2.24 0.0253  
 h_contrl=Private_profit         -0.0713 0.0445  -1.60 0.1093

Below are the key fit summary statistics like the Nagelkerke R-square and the area under the ROC curve as they are presented in the lrm output for mod1

The r square is very low (0.132) as well as the C statistic (0.703).

mod2_lrm

Logistic Regression Model
 
 lrm(formula = oi ~ ms_f + age + sex + race + insurance + patient_loc + 
     region + zipinc_qrtl + ED_record + elective_admin + tran_in + 
     aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + 
     hf + kidney + dm + copd + ibd, data = ms_test, x = TRUE, 
     y = TRUE)
 
                        Model Likelihood    Discrimination    Rank Discrim.    
                              Ratio Test           Indexes          Indexes    
 Obs         64591    LR chi2    5451.60    R2       0.132    C       0.703    
  0          52879    d.f.            49    g        1.022    Dxy     0.405    
  1          11712    Pr(> chi2) <0.0001    gr       2.780    gamma   0.405    
 max |deriv| 6e-11                          gp       0.124    tau-a   0.120    
                                            Brier    0.137                     
 
                                 Coef    S.E.   Wald Z Pr(>|Z|)
 Intercept                       -2.4439 0.1148 -21.29 <0.0001 
 ms_f=yes                         0.7908 0.0329  24.00 <0.0001 
 age                              0.0142 0.0008  17.28 <0.0001 
 sex=female                      -0.0652 0.0218  -2.99 0.0028  
 race=Black                      -0.1330 0.0338  -3.94 <0.0001 
 race=Hispanic                   -0.0157 0.0394  -0.40 0.6896  
 race=Other                      -0.0042 0.0685  -0.06 0.9509  
 race=Asian                      -0.0908 0.0746  -1.22 0.2237  
 race=NativeA                     0.1615 0.1414   1.14 0.2535  
 insurance=Private               -0.2138 0.0332  -6.44 <0.0001 
 insurance=Medicaid              -0.1370 0.0389  -3.52 0.0004  
 insurance=Self_pay              -0.1787 0.0658  -2.72 0.0066  
 insurance=Other                 -0.1770 0.0709  -2.50 0.0126  
 patient_loc=Fringe              -0.0093 0.0314  -0.30 0.7665  
 patient_loc=metro>250K           0.0374 0.0320   1.17 0.2421  
 patient_loc=metro>50K            0.0305 0.0419   0.73 0.4674  
 patient_loc=micro                0.0578 0.0529   1.09 0.2744  
 patient_loc=Other                0.0536 0.0543   0.99 0.3238  
 region=EastNorth_Central        -0.0060 0.0365  -0.17 0.8689  
 region=Middle_Atlantic          -0.1446 0.0385  -3.76 0.0002  
 region=Pacific                   0.1440 0.0398   3.61 0.0003  
 region=WestSouth_Central         0.1532 0.0404   3.80 0.0001  
 region=WestNorth_Central        -0.0130 0.0498  -0.26 0.7935  
 region=EastSouth_Central         0.0345 0.0470   0.73 0.4627  
 region=Mountain                  0.0660 0.0510   1.29 0.1959  
 region=NewEngland               -0.0459 0.0546  -0.84 0.4008  
 zipinc_qrtl=48-61K              -0.0020 0.0291  -0.07 0.9441  
 zipinc_qrtl=61-82K               0.0050 0.0318   0.16 0.8742  
 zipinc_qrtl=82K+                -0.0109 0.0362  -0.30 0.7621  
 ED_record=yes                    0.6166 0.0344  17.90 <0.0001 
 elective_admin=yes              -0.9886 0.0481 -20.56 <0.0001 
 tran_in=acute_care               0.3925 0.0484   8.11 <0.0001 
 tran_in=other                    0.5756 0.0484  11.89 <0.0001 
 aweekend=yes                     0.0962 0.0249   3.86 0.0001  
 amonth                          -0.0157 0.0031  -5.12 <0.0001 
 hosp_bedsize=medium             -0.0183 0.0305  -0.60 0.5485  
 hosp_bedsize=large              -0.0699 0.0286  -2.44 0.0146  
 hosp_locteach=urban_nonteaching -0.0772 0.0547  -1.41 0.1584  
 hosp_locteach=urban_teaching    -0.0807 0.0512  -1.58 0.1149  
 h_contrl=private_notprofit       0.0716 0.0360   1.99 0.0468  
 h_contrl=Private_profit         -0.0764 0.0446  -1.71 0.0868  
 hf=y                            -0.0163 0.0284  -0.57 0.5657  
 kidney=no                       -0.2357 0.0512  -4.60 <0.0001 
 kidney=stage1_2                 -0.0810 0.1071  -0.76 0.4495  
 kidney=stage3                   -0.0542 0.0603  -0.90 0.3690  
 kidney=stage4                   -0.1815 0.0844  -2.15 0.0315  
 kidney=stage5_ESR                0.1973 0.0686   2.88 0.0040  
 dm=y                             0.1889 0.0238   7.93 <0.0001 
 copd=y                           0.0872 0.0283   3.08 0.0021  
 ibd=y                            0.4656 0.1501   3.10 0.0019

8 Fitting the propensity score model

I will now fit the propensity score, which predicts MS status based on these 30 available covariates:

discwt, age, sex, race, insurance, patient_loc, region, zipinc_qrtl, obese,htn, cad, hf, hld, pvd, dm2, dm1, kidney, copd, asthma, osteoporosis, ra, fibromyalgia, glaucoma, depression, anxiety, bipolar, epilepsy, migraine, ibd, ibs, hashimoto, lupus, psoriasis, scd, hiv, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl

paste(colnames(ms1718c), collapse = ", ")

[1] "key_nis, year, ms, ms_f, oi, oi_f, discwt, age, sex, race, insurance, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, pvd, dm, kidney, copd, asthma, osteoporosis, ra, fibromyalgia, glaucoma, depression, anxiety, bipolar, epilepsy, migraine, ibd, ibs, hashimoto, lupus, psoriasis, scd, hiv, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl, oi_type, rec_pneumonia, inv_gbs, invasive_enterobacteriaceae, dissem_tb, bacteremia_meningitis, disseminated_bartonella, legionella, m_aviuum, candida_severe, invasive_aspergillus, pcp, bartonella, crypto_extrapul, coccidioidomycosis, histoplasmosis, mucormycosis, cmv_pneumonia, cmv_pancreatitis, other_severe_cmv, cmv_other, ebv, hsv_encephalitis, varicella_systemic, zoster, rsv, hpn, hhv6_7, pml, enteroviral_meningitis, parvovirus, babesia, toxoplasma, visceral_leishmaniasis, acanthamoeba, naegleriasis, strongyloidiasis, taenia, flu, nosocomial, vap, catheter_inf, surgical_inf, bloodstream_catheter, c_diff"

We’ll use the f.build tool from the cobalt package here.

ms_test <- ms_test %>%
    mutate(treat = as.logical(ms_f == "yes"))

covs_1 <- ms_test %>%
    select(discwt, age, sex, race, insurance, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, pvd, dm, kidney, copd, asthma, osteoporosis, ra, fibromyalgia, glaucoma, depression, anxiety, bipolar, epilepsy, migraine, ibd, ibs, hashimoto, lupus, psoriasis, scd, hiv, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl)

prop_model <- glm(f.build("treat", covs_1), data = ms_test,
                  family = binomial)

tidy(prop_model, conf.int = TRUE) %>%
    select(term, estimate, std.error, conf.low, conf.high, p.value) %>%
    knitr::kable(digits = 3)

term	estimate	std.error	conf.low	conf.high	p.value
(Intercept)	-194.057	755.968	-1698.768	1254.074	0.797
discwt	38.451	151.194	-251.177	339.394	0.799
age	-0.018	0.001	-0.020	-0.016	0.000
sexfemale	0.573	0.033	0.509	0.637	0.000
raceBlack	0.126	0.044	0.039	0.212	0.004
raceHispanic	-0.609	0.062	-0.732	-0.489	0.000
raceOther	-0.391	0.099	-0.590	-0.202	0.000
raceAsian	-1.311	0.158	-1.636	-1.013	0.000
raceNativeA	-0.839	0.270	-1.412	-0.346	0.002
insurancePrivate	-0.868	0.044	-0.955	-0.782	0.000
insuranceMedicaid	-1.170	0.053	-1.274	-1.067	0.000
insuranceSelf_pay	-1.550	0.110	-1.771	-1.340	0.000
insuranceOther	-0.978	0.101	-1.181	-0.783	0.000
patient_locFringe	0.010	0.041	-0.070	0.090	0.802
patient_locmetro>250K	-0.092	0.044	-0.178	-0.006	0.036
patient_locmetro>50K	-0.081	0.057	-0.193	0.030	0.156
patient_locmicro	-0.061	0.072	-0.203	0.079	0.397
patient_locOther	-0.051	0.076	-0.201	0.096	0.500
regionEastNorth_Central	0.321	0.048	0.228	0.415	0.000
regionMiddle_Atlantic	0.249	0.050	0.152	0.346	0.000
regionPacific	0.087	0.057	-0.024	0.198	0.123
regionWestSouth_Central	-0.058	0.060	-0.176	0.058	0.329
regionWestNorth_Central	0.024	0.067	-0.107	0.153	0.717
regionEastSouth_Central	-0.096	0.068	-0.229	0.035	0.156
regionMountain	0.263	0.067	0.130	0.394	0.000
regionNewEngland	0.117	0.072	-0.024	0.257	0.101
zipinc_qrtl48-61K	0.186	0.041	0.106	0.265	0.000
zipinc_qrtl61-82K	0.216	0.044	0.131	0.302	0.000
zipinc_qrtl82K+	0.256	0.049	0.160	0.351	0.000
obesey	0.131	0.039	0.053	0.207	0.001
htny	0.011	0.036	-0.060	0.082	0.765
cady	-0.253	0.046	-0.344	-0.163	0.000
hfy	-0.538	0.054	-0.646	-0.432	0.000
hldy	-0.154	0.036	-0.226	-0.083	0.000
pvdy	-0.031	0.095	-0.221	0.151	0.744
dmy	-0.138	0.038	-0.212	-0.064	0.000
kidneyno	0.130	0.087	-0.038	0.304	0.136
kidneystage1_2	-0.187	0.189	-0.569	0.173	0.323
kidneystage3	-0.307	0.107	-0.516	-0.095	0.004
kidneystage4	-0.532	0.170	-0.875	-0.206	0.002
kidneystage5_ESR	-0.961	0.144	-1.248	-0.681	0.000
copdy	-0.336	0.046	-0.427	-0.246	0.000
asthmay	-0.085	0.053	-0.190	0.019	0.112
osteoporosisy	0.236	0.073	0.091	0.377	0.001
ray	-0.304	0.108	-0.522	-0.097	0.005
fibromyalgiay	0.682	0.079	0.525	0.836	0.000
glaucomay	-0.019	0.113	-0.246	0.198	0.869
depressiony	0.657	0.037	0.584	0.728	0.000
anxietyy	-0.059	0.040	-0.138	0.020	0.144
bipolary	0.039	0.074	-0.108	0.183	0.595
epilepsyy	0.667	0.055	0.559	0.773	0.000
migrainey	0.560	0.073	0.415	0.703	0.000
ibdy	-0.143	0.221	-0.601	0.269	0.517
ibsy	0.128	0.113	-0.097	0.344	0.257
hashimotoy	-0.147	0.327	-0.835	0.455	0.653
lupusy	0.170	0.149	-0.130	0.454	0.254
psoriasisy	-0.019	0.180	-0.388	0.321	0.915
scdy	-0.576	0.205	-0.997	-0.192	0.005
hivy	-0.643	0.335	-1.365	-0.040	0.055
ED_recordyes	0.602	0.044	0.516	0.689	0.000
elective_adminyes	-0.266	0.052	-0.368	-0.164	0.000
tran_inacute_care	0.280	0.066	0.149	0.408	0.000
tran_inother	0.539	0.064	0.412	0.663	0.000
aweekendyes	-0.017	0.035	-0.085	0.051	0.630
amonth	-0.008	0.004	-0.016	0.001	0.068
hosp_bedsizemedium	0.041	0.042	-0.040	0.123	0.323
hosp_bedsizelarge	0.084	0.039	0.009	0.160	0.030
hosp_locteachurban_nonteaching	-0.154	0.075	-0.301	-0.006	0.042
hosp_locteachurban_teaching	-0.058	0.070	-0.195	0.080	0.411
h_contrlprivate_notprofit	0.078	0.050	-0.020	0.178	0.120
h_contrlPrivate_profit	-0.036	0.063	-0.160	0.088	0.567

 glance(prop_model)

# A tibble: 1 x 8
  null.deviance df.null  logLik    AIC    BIC deviance df.residual  nobs
          <dbl>   <int>   <dbl>  <dbl>  <dbl>    <dbl>       <int> <int>
1        38892.   64590 -17531. 35204. 35848.   35062.       64520 64591

From table 1, I will remove variables that have <5% in a category

# msc <- ms1718c %>% 
# mutate(race_c =fct_collapse(factor(race), 
#                             Other = c("Other", "NativeA","Asian"))) %>% 
# mutate(insurance_c =fct_collapse(factor(insurance), 
#                             Other = c("Other", "Self_pay"))) %>% 
# mutate(CKD =fct_collapse(factor(kidney), 
#                             mod_sev = c("stage3", "stage4", "stage5_ESR"),
#                          other = c("stage1_2", "CKDother"))) %>% 
#   select( -pvd, -race, -insurance, -dm1, -kidney, -ra,  -glaucoma, -bipolar, -ibd, -ibs, - hashimoto,  -psoriasis, -scd, -hiv,- lupus) %>% 
#   
#   select(key_nis, year, ms, ms_f, oi, oi_f, discwt, age, race_c, sex, insurance_c, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, dm2, CKD, copd, asthma, osteoporosis, fibromyalgia, depression, anxiety, epilepsy, migraine, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl, oi_type, rec_pneumonia, inv_gbs, invasive_enterobacteriaceae, dissem_tb, bacteremia_meningitis, disseminated_bartonella, legionella, m_aviuum, candida_severe, invasive_aspergillus, pcp, bartonella, crypto_extrapul, coccidioidomycosis, histoplasmosis, mucormycosis, cmv_pneumonia, cmv_pancreatitis, other_severe_cmv, cmv_other, ebv, hsv_encephalitis, varicella_systemic, zoster, rsv, hpn, hhv6_7, pml, enteroviral_meningitis, parvovirus, babesia, toxoplasma, visceral_leishmaniasis, acanthamoeba, naegleriasis, strongyloidiasis, taenia, flu, nosocomial, vap, catheter_inf, surgical_inf, bloodstream_catheter, c_diff)

# set.seed(1)
# ms_split <- rsample::initial_split(msc, prop = 0.7,
# strata = oi_f, ms_f)
# ms_train <- rsample::training(ms_split) 
# ms_test <- rsample::testing(ms_split)

# msc <- msc %>%
#     mutate(treat = as.logical(ms == "yes"))
# 
# covs_1 <- msc %>%
#     select(discwt, age, race_c, sex, insurance_c, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, dm2, CKD, copd, asthma, osteoporosis, fibromyalgia, depression, anxiety, epilepsy, migraine, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl)
# 
# prop_model <- glm(f.build("treat", covs_1), data = msc,
#                   family = binomial)
# 
# tidy(prop_model, conf.int = TRUE) %>%
#     select(term, estimate, std.error, conf.low, conf.high, p.value) %>%
#     knitr::kable(digits = 3)

8.0.1 Storing the Propensity Scores

ms_test <- ms_test %>%
    mutate(ps = prop_model$fitted,
           linps = prop_model$linear.predictors)

ggplot(ms_test, aes(x = ms_f, y = linps)) +
    geom_violin() +
    geom_boxplot(width = 0.3)

The density plot below shows us that we have a substantial number of MS patients who do not have overlapping PS scores with non-MS patients.

 ggplot(ms_test, aes(x = linps, fill = ms_f)) +
    geom_density(alpha = 0.3)

9 `match_4` Caliper Matching (1:1 without replacement) with the `Matching` package

I will use the Match function to specify a caliper of 0.2 (from Matching package)
- Here, subjects will only be matched if they fall within the maximum distance of 0.2 standard deviations of the logit of the propensity score.
- If they do not fall within this range, those subjects will be dropped.
The matching design will be 1:1 without replacement

match_4 <- Match(Tr = ms_test$treat, X = ms_test$linps,
                 M = 1, replace = FALSE, ties = FALSE,
                 caliper = 0.2, estimand = "ATT")

summary(match_4)


Estimate...  0 
SE.........  0 
T-stat.....  NaN 
p.val......  NA 

Original number of observations..............  64591 
Original number of treated obs...............  5772 
Matched number of observations...............  5763 
Matched number of observations  (unweighted).  5763 

Caliper (SDs)........................................   0.2 
Number of obs dropped by 'exact' or 'caliper'  9

Note that we have now dropped 9 of the MS=yes subjects, and reduced our sample to the 5763 remaining MS=yes subjects, who are paired with 5763 unique matched MS = no subjects in our matched sample.

9.1 Obtaining the Matched Sample

Here I am storing the matched sample

match4_matches <- factor(rep(match_4$index.treated, 2))
ms_matched4 <- cbind(match4_matches,
                         ms_test[c(match_4$index.control,
                                  match_4$index.treated),])

How many unique subjects are in our matched sample?

ms_matched4 %$% n_distinct(key_nis)

[1] 11521

This match includes 5763 pairs so 11521 subjects, since we’ve done matching without replacement.

 ms_matched4 %>% count(ms_f)

  ms_f    n
1   no 5763
2  yes 5763

9.2 Checking Covariate Balance for our 1:1 Caliper Match

9.2.1 Using `bal.tab` to obtain a balance table

covs_4plus <- ms_test %>%
    select(discwt, age, sex, race, insurance, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, pvd, dm, kidney, copd, asthma, osteoporosis, ra, fibromyalgia, glaucoma, depression, anxiety, bipolar, epilepsy, migraine, ibd, ibs, hashimoto, lupus, psoriasis, scd, hiv, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl, ps, linps)

bal4 <- bal.tab(match_4,
                treat = ms_test$ms_f,
                covs = covs_4plus, quick = FALSE,
                data = ms_test, stats = c("m", "v"),
                un = TRUE, disp.v.ratio = TRUE)
bal4

Balance Measures
                                   Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
discwt                          Contin.  0.0074     0.8282   0.0079      0.8901
age                             Contin. -0.1163     0.5196   0.0314      0.5044
sex_female                       Binary  0.1448          .   0.0066           .
race_White                       Binary  0.0683          .  -0.0017           .
race_Black                       Binary  0.0141          .  -0.0014           .
race_Hispanic                    Binary -0.0518          .   0.0019           .
race_Other                       Binary -0.0081          .   0.0010           .
race_Asian                       Binary -0.0194          .  -0.0005           .
race_NativeA                     Binary -0.0031          .   0.0007           .
insurance_Medicare               Binary  0.1099          .   0.0163           .
insurance_Private                Binary -0.0310          .  -0.0111           .
insurance_Medicaid               Binary -0.0477          .  -0.0024           .
insurance_Self_pay               Binary -0.0211          .  -0.0036           .
insurance_Other                  Binary -0.0101          .   0.0009           .
patient_loc_Central              Binary -0.0021          .  -0.0028           .
patient_loc_Fringe               Binary  0.0322          .   0.0000           .
patient_loc_metro>250K           Binary -0.0149          .  -0.0005           .
patient_loc_metro>50K            Binary -0.0026          .  -0.0045           .
patient_loc_micro                Binary -0.0033          .   0.0016           .
patient_loc_Other                Binary -0.0093          .   0.0062           .
region_South_Atlantic            Binary -0.0201          .   0.0064           .
region_EastNorth_Central         Binary  0.0401          .  -0.0059           .
region_Middle_Atlantic           Binary  0.0329          .  -0.0026           .
region_Pacific                   Binary -0.0197          .   0.0000           .
region_WestSouth_Central         Binary -0.0338          .   0.0009           .
region_WestNorth_Central         Binary -0.0002          .   0.0052           .
region_EastSouth_Central         Binary -0.0126          .  -0.0003           .
region_Mountain                  Binary  0.0053          .  -0.0009           .
region_NewEngland                Binary  0.0082          .  -0.0028           .
zipinc_qrtl_<48K                 Binary -0.0522          .   0.0029           .
zipinc_qrtl_48-61K               Binary  0.0057          .  -0.0035           .
zipinc_qrtl_61-82K               Binary  0.0198          .   0.0045           .
zipinc_qrtl_82K+                 Binary  0.0266          .  -0.0040           .
obese_y                          Binary  0.0129          .   0.0026           .
htn_y                            Binary  0.0463          .   0.0010           .
cad_y                            Binary -0.0776          .   0.0012           .
hf_y                             Binary -0.0827          .  -0.0016           .
hld_y                            Binary -0.0519          .  -0.0028           .
pvd_y                            Binary -0.0055          .  -0.0024           .
dm_y                             Binary -0.0537          .  -0.0040           .
kidney_CKDother                  Binary -0.0069          .  -0.0002           .
kidney_no                        Binary  0.0736          .  -0.0036           .
kidney_stage1_2                  Binary -0.0033          .   0.0000           .
kidney_stage3                    Binary -0.0282          .   0.0035           .
kidney_stage4                    Binary -0.0107          .   0.0002           .
kidney_stage5_ESR                Binary -0.0244          .   0.0002           .
copd_y                           Binary -0.0362          .  -0.0023           .
asthma_y                         Binary  0.0146          .  -0.0026           .
osteoporosis_y                   Binary  0.0171          .   0.0017           .
ra_y                             Binary -0.0007          .  -0.0017           .
fibromyalgia_y                   Binary  0.0302          .  -0.0012           .
glaucoma_y                       Binary  0.0015          .   0.0014           .
depression_y                     Binary  0.1337          .  -0.0064           .
anxiety_y                        Binary  0.0623          .  -0.0078           .
bipolar_y                        Binary  0.0102          .  -0.0049           .
epilepsy_y                       Binary  0.0531          .  -0.0038           .
migraine_y                       Binary  0.0317          .  -0.0002           .
ibd_y                            Binary  0.0002          .   0.0007           .
ibs_y                            Binary  0.0090          .  -0.0014           .
hashimoto_y                      Binary  0.0006          .  -0.0003           .
lupus_y                          Binary  0.0047          .  -0.0007           .
psoriasis_y                      Binary  0.0012          .  -0.0012           .
scd_y                            Binary -0.0004          .  -0.0019           .
hiv_y                            Binary -0.0016          .  -0.0003           .
ED_record_yes                    Binary  0.1115          .  -0.0052           .
elective_admin_yes               Binary -0.0825          .   0.0040           .
tran_in_not_transferred          Binary -0.0208          .   0.0003           .
tran_in_acute_care               Binary -0.0018          .  -0.0007           .
tran_in_other                    Binary  0.0226          .   0.0003           .
aweekend_yes                     Binary  0.0174          .  -0.0095           .
amonth                          Contin. -0.0182     1.0038   0.0065      1.0094
hosp_bedsize_small               Binary -0.0057          .  -0.0104           .
hosp_bedsize_medium              Binary -0.0029          .   0.0082           .
hosp_bedsize_large               Binary  0.0087          .   0.0023           .
hosp_locteach_rural              Binary -0.0008          .   0.0040           .
hosp_locteach_urban_nonteaching  Binary -0.0180          .  -0.0026           .
hosp_locteach_urban_teaching     Binary  0.0189          .  -0.0014           .
h_contrl_gov_nonfed              Binary -0.0182          .  -0.0049           .
h_contrl_private_notprofit       Binary  0.0482          .   0.0024           .
h_contrl_Private_profit          Binary -0.0301          .   0.0024           .
ps                              Contin.  0.6431     2.7834   0.0001      0.9999
linps                           Contin.  0.8364     1.0691   0.0001      0.9999

Sample sizes
             no  yes
All       58819 5772
Matched    5763 5763
Unmatched 53056    0
Discarded     0    9

9.2.2 Checking Rubin’s Rules 1 and 2

Below is a table of Rubin’s Rules 1 & 2 before and after our 1:1 caliper match is applied

covs_for_rubin <- ms_test %>%
    select(linps)

rubin_m4 <- bal.tab(match_4,
                treat = ms_test$treat,
                covs = covs_for_rubin,
                data = ms_test, stats = c("m", "v"),
                un = TRUE, disp.v.ratio = TRUE)[1]

rubin_report_m4 <- tibble(
    status = c("Rule1", "Rule2"),
    Unmatched = c(rubin_m4$Balance$Diff.Un,
                  rubin_m4$Balance$V.Ratio.Un),
    Matched = c(rubin_m4$Balance$Diff.Adj,
               rubin_m4$Balance$V.Ratio.Adj))

rubin_report_m4 %>% knitr::kable(digits = 2)

status	Unmatched	Matched
Rule1	0.84	0
Rule2	1.07	1

Rubin Rule 1: We want the standardized differences of the propensity scores of the two groups to be as close to zero as possible.
- Before matching we have a bias of 83.637686%, and this is reduced to 0.008155% after 1:1 caliper matching.
Rubin Rule 2: we want the variances of the ratio of the lin PS to be within 4/5 to 5/4.
- Before matching we have a variance ratio of 106.9120203%, and this becomes 99.9921849% after 1:1 caliper matching.

In summary, we can see that caliper matching produced an exceptionally strong match. Rubin’s Rule 1 is 0 and Rule 2 is essentially 1. However, this cost us 9 patients in the MS group who were the “hardest to match”, which is quite sad.

9.2.3 Using `bal.plot` from `cobalt`

We can graphically compare the distribution of PS in both groups with mirrored histograms, before and after matching

bal.plot(match_4,
         treat = ms_test$ms_f,
         covs = covs_4plus,
         var.name = "ps", 
         which = "both",
         sample.names = 
             c("Unmatched Sample", "match_4 Sample"),
         type = "histogram", mirror = TRUE)

After matching, the mirrored histogram looks the same! So cool!!

9.2.4 Using `love.plot` to look at Standardized Differences

The love plot below helps us to look at the balance of each of the covariates.

We can see that we got all of the covariates in the desired range of -10% to +10%, except for age. However its a whole lot better than it was before.

love.plot(bal4, 
          threshold = .1, size = 3,
          var.order = "unadjusted",
          stats = "mean.diffs",
          stars = "raw",
          sample.names = c("Unmatched", "Matched"),
          title = "Love Plot for our 1:1 Caliper Match") +
    labs(caption = "* indicates raw mean differences (for binary variables)")

9.2.5 Using `love.plot` to look at Variance Ratios

We only had three continuous variables.

Most importantly, we can visually see that we have met Rubin’s Rule #2 because the variance of the linear PS=1

love.plot(bal4, 
          threshold = .5, size = 3,
          stats = "variance.ratios",
          sample.names = c("Unmatched", "Matched"),
          title = "Variance Ratios for our 1:1 Caliper Match")

10 Adjusted Analysis 1

adj.m.out <- clogit(oi ~ ms_f + strata(match4_matches), data=ms_matched4)
adj.m_tidy <- tidy(adj.m.out, exponentiate = TRUE,
                        conf.int = TRUE)

adj.m_tidy %>% kable(digits=2)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
ms_fyes	2.12	0.05	16.11	0	1.93	2.32

2017 & 2018 Burden of Opportunistic Infections in Hospitalized MS Patients