2017 & 2018 Burden of Opportunistic Infections in Hospitalized MS Patients
Preliminaries
library(skimr); library(tableone)
library(magrittr); library(janitor)
library(broom); library(survival); library(lme4)
library(cobalt); library(Matching); library(MatchIt)
library(rms)
library(yardstick)
library(naniar)
library(rbounds)
library(survey)
library(twang);
library(tidyverse)
theme_set(theme_bw())
1 My Data
Healthcare Cost and Utilization Project, Nationwide Inpatient Sample (HCUP-NIS) is the largest available all-payer inpatient healthcare administrative data set. It approximates a 20-percent stratified sample of all discharges from United States hospitals. It constitutes data from 48 states and 10,000 community hospitals, representing 95% of the United States population. Data from each record contains information regarding patient demographics, diagnoses, procedures, and other information associated with a hospital admission.
The data can be purchased by the public with at the following link: https://www.hcup-us.ahrq.gov/nisoverview.jsp
Strengths of how this data set relates to my research question:
- Nationally representative
- Inpatient hospitalizations
- Collects information on sociodemographic factors which I can adjust for
- Collects information on up to 40 diagnoses, thus I can capture both the exposure and outcome
Limitations of the data set:
the data quality of secondary databases is not perfect as the diagnoses codes may not necessarily be accurate, granular, or complete
The latest data available is from 2018.
1.1 Data Ingest
Below I am ingesting my data ms1718_raw
.
ms1718_raw<- read.csv("ms1718.csv") %>%
clean_names()
ms1718_raw <- ms1718_raw %>%
haven::zap_label() %>%
mutate(key_nis = as.character(key_nis)) %>% select(-oi)
dim(ms1718_raw)
[1] 682968 100
As originally loaded, the ms1718_raw
data contain 682968 rows and 100 columns.
1.2 Tidying, Data Cleaning and Data Management
Below I am cleaning the data according to the HCUP_NIS code book found at https://www.hcup-us.ahrq.gov/db/nation/nis/nisdde.jsp
In summary I have:
1.) converted all variables to factors except for age
, key_nis
, and discwt
2.) coded the variable levels for factors with more descriptive names rather than numbers
3.) reordered according to frequency with fct_infreq
4.) selected only the variables that I will use
Below I am creating the outcome, OI from OI_type. If there are no OIs, then OI=0, else OI=1
oi n percent
0 562476 0.8235759
1 120492 0.1764241
ms1718 <- ms1718_raw %>%
mutate(female=as.numeric(female)) %>%
mutate(sex = fct_recode(factor(female),
"male" = "0", "female" = "1")) %>%
# zipinc_qrtl = fct_recode(factor(zipinc_qrtl),
# "<48K"= "1",
# "48-61K" = "2",
# "61-82K"= "3",
# "82K+" = "4")) %>%
mutate(pay1=as.numeric(pay1)) %>%
mutate(insurance = fct_recode(factor(pay1),
"Medicare" = "1",
"Medicaid" = "2",
"Private" = "3",
"Self_pay" = "4",
"Other" = "5",
"Other" = "6"),
insurance = fct_infreq(insurance),
patient_loc = fct_recode(factor(pl_nchs),
"Central" = "1",
"Fringe" = "2",
"metro>250K" = "3",
"metro>50K" = "4",
"micro" = "5",
"Other" = "6" ),
patient_loc = fct_infreq(patient_loc)) %>%
mutate(region = fct_recode(factor(hosp_division),
"NewEngland" = "1",
"Middle_Atlantic" = "2",
"EastNorth_Central" = "3",
"WestNorth_Central" = "4",
"South_Atlantic" = "5",
"EastSouth_Central" = "6",
"WestSouth_Central" = "7",
"Mountain" = "8",
"Pacific" = "9"),
region = fct_infreq(region)) %>%
mutate(ED_record = fct_recode(factor(hcup_ed),
"no" = "0",
"yes" = "1", "yes" = "2", "yes" ="3", "yes"="4")) %>%
mutate(oi_f = fct_recode(factor(oi),
"yes"= "1",
"no" ="0")) %>%
mutate(ms_f = fct_recode(factor(ms),
"yes"= "1",
"no" ="0")) %>%
select(-female, - hcup_ed, - hosp_division, - pay1, -pl_nchs, -hosp_nis)
1.3 checking variables
1.3.1 categorical variables
1.3.1.1 zip_inc
ms1718 <- ms1718 %>%
mutate(zipinc_qrtl= fct_recode(factor(zipinc_qrtl),
"<48K"= "1",
"48-61K" = "2",
"61-82K"= "3",
"82K+" = "4",
NULL= "A",
NULL = ""))
ms1718 %>% tabyl(zipinc_qrtl)
zipinc_qrtl n percent valid_percent
<48K 199757 0.29248369 0.2975692
48-61K 179309 0.26254378 0.2671087
61-82K 159814 0.23399925 0.2380679
82K+ 132416 0.19388317 0.1972543
<NA> 11672 0.01709011 NA
1.3.1.2 race
ms1718 <- ms1718 %>%
mutate(race=as.numeric(race)) %>%
mutate(race = fct_recode(factor(race),
"White" = "1",
"Black" = "2",
"Hispanic"= "3",
"Asian" = "4",
"NativeA"= "5",
"Other" = "6")) %>%
mutate(race = fct_infreq(race))
we have a lot of missing in the MS group for race. and the numbers are quite small for asian and native american.
ms White Black Hispanic Other Asian NativeA NA_
0 404266 91201 68467 18247 16618 3712 18369
1 45319 9732 3492 1256 360 218 1711
1.3.1.3 aweekend
An indicator of whether the admission day is on the weekend (AWEEKEND) is calculated from the admission date (ADATE). If AWEEKEND cannot be calculated (ADATE is missing or invalid).
ms1718 <- ms1718 %>%
mutate(aweekend = fct_recode(factor(aweekend),
"no" = "0", "yes" = "1"))
ms1718 %>% tabyl(ms_f, aweekend)
ms_f no yes NA_
no 492829 128047 4
yes 48486 13602 0
1.3.2 elective
ELECTIVE indicates whether the admission to the hospital was elective. This information was derived from the type of admission (ATYPE). If the admission type was missing or invalid, then ELECTIVE is also missing or invalid. If the admission type indicated an elective admission (ATYPE = 3), then ELECTIVE was set to 1. Otherwise, for any other valid non-missing ATYPE values, ELECTIVE was set to 0.
ms1718 <- ms1718 %>%
mutate(elective=as.numeric(elective)) %>%
mutate(elective_admin = fct_recode(factor(elective),
"no" = "0", "yes" = "1")) %>%
select(-elective)
elective admin is fine
some missing values (ms group n=39)
ms_f no yes NA_
no 475066 144804 1010
yes 52523 9461 104
1.3.3 Teaching status (h_contrl)
Teaching Status: Beginning in 1998, a hospital is considered a teaching hospital if it has one or more Accreditation Council for Graduate Medical Education (ACGME) approved residency programs, is a member of the Council of Teaching Hospitals (COTH) or has a ratio of full-time equivalent interns and residents to beds of .25 or higher. Rural hospitals were not split according to teaching status, because rural teaching hospitals were rare
ms1718 <- ms1718 %>%
mutate(hosp_locteach = fct_recode(factor(hosp_locteach),
"rural" = "1", "urban_nonteaching" = "2", "urban_teaching" = "3" ))
ms1718 %>% tabyl(ms_f, hosp_locteach)
ms_f rural urban_nonteaching urban_teaching
no 56708 135839 428333
yes 5070 13266 43752
The hospital’s ownership/control category was obtained from the AHA Annual Survey of Hospitals and includes categories for government nonfederal (public), private not-for-profit (voluntary) and private investor-owned (proprietary). Hospitals in different ownership/control categories tend to have different missions and different responses to government regulations and policies.
ms1718 <- ms1718 %>%
mutate(h_contrl = fct_recode(factor(h_contrl),
"gov_nonfed" = "1", "private_notprofit" = "2", "Private_profit" = "3" ))
ms1718 %>% tabyl(ms_f, h_contrl)
ms_f gov_nonfed private_notprofit Private_profit
no 71258 456378 93244
yes 6010 48394 7684
1.3.4 tran_in
The data element TRAN_IN indicates that the non-newborn patient was transferred into the hospital and is defined using either admission source (ASOURCE) or point of origin (PointOfOriginUB04), depending on data availability. The coding of admission source and point of origin varies by the admission type. When the admission type indicates a newborn (ATYPE=4) then the admission source and point of origin indicate the type of birth instead of the type of transfer. Therefore, the identification of transfers in TRAN_IN is specific to non-newborn patients with ATYPE not equal to 4.
ms1718 <- ms1718 %>%
mutate(tran_in = fct_recode(factor(tran_in),
"not_transferred" = "0", "acute_care" = "1", "other" = "2" ))
ms1718 %>% tabyl(ms_f, tran_in)
ms_f not_transferred acute_care other NA_
no 555216 39194 23580 2890
yes 53965 3992 3848 283
1.3.5 Bedsize
Bedsize categories are based on hospital beds, and are specific to the hospital’s location and teaching status. Bedsize assesses the number of short-term acute care beds set up and staffed in a hospital. Hospital information was obtained from the AHA Annual Survey of Hospitals.
ms1718 <- ms1718 %>%
mutate(hosp_bedsize = fct_recode(factor(hosp_bedsize),
"small" = "1", "medium" = "2", "large" = "3" ))
ms1718 %>% tabyl(ms_f, hosp_bedsize)
ms_f small medium large
no 128612 182753 309515
yes 13065 18329 30694
1.3.6 hosp_region
1.3.7 comorbidities
I am turning all of the comorbidities from character variables to factor variables
ms1718 <- ms1718 %>%
mutate(depression = as.factor(depression), htn = as.factor(htn), migraine = as.factor(migraine), hld = as.factor(hld), anxiety = as.factor(anxiety), copd = as.factor(copd), asthma = as.factor(asthma), ibs = as.factor(ibs), hashimoto = as.factor(hashimoto), osteoporosis = as.factor(osteoporosis), dm = as.factor(dm), ra = as.factor(ra), fibromyalgia = as.factor(fibromyalgia), cad = as.factor(cad), ibd = as.factor(ibd), glaucoma = as.factor(glaucoma), bipolar = as.factor(bipolar), epilepsy = as.factor(epilepsy), pvd = as.factor(pvd), ckd = as.factor(ckd), lupus = as.factor(lupus), psoriasis = as.factor(psoriasis), scd = as.factor(scd), hiv = as.factor(hiv), kidney=as.factor(kidney))
1.4 check quantitative variables
All numeric variables look plausible (age, amonth, discwt)
.
3 Variables 682968 Observations
--------------------------------------------------------------------------------
age
n missing distinct Info Mean Gmd .05 .10
682968 0 73 1 57.95 22.73 24 28
.25 .50 .75 .90 .95
42 61 74 83 88
lowest : 18 19 20 21 22, highest: 86 87 88 89 90
--------------------------------------------------------------------------------
amonth
n missing distinct Info Mean Gmd .05 .10
682168 800 12 0.993 6.477 3.981 1 2
.25 .50 .75 .90 .95
3 6 9 11 12
lowest : 1 2 3 4 5, highest: 8 9 10 11 12
Value 1 2 3 4 5 6 7 8 9 10 11
Frequency 60355 53961 58609 55660 57503 55842 56987 58172 54819 58122 55405
Proportion 0.088 0.079 0.086 0.082 0.084 0.082 0.084 0.085 0.080 0.085 0.081
Value 12
Frequency 56733
Proportion 0.083
--------------------------------------------------------------------------------
discwt
n missing distinct Info Mean Gmd .05 .10
682968 0 314 0.996 5 6.098e-05 5 5
.25 .50 .75 .90 .95
5 5 5 5 5
lowest : 4.991597 4.995833 4.996289 4.996753 4.996894
highest: 5.001855 5.001938 5.001965 5.001972 5.002649
--------------------------------------------------------------------------------
1.5 Missingness
I have 41270 missing observations in the ms1718
data set.
For the control group, I filtered out complete cases on the variables of interest before I did the 10% sample. So the missingness is really only in the ms group.
Below we can see that we have the most missingness for race
(2.1%), zipinc_qrtl
(1.2%), tran_in
(0.43%), and patient_loc
(0.28%), elective_admin
(0.11%), insurance
(.105%)
ms1718 %>% select(age, amonth, aweekend, discwt, key_nis, race, tran_in, zipinc_qrtl, ms, hosp_bedsize, hosp_locteach, h_contrl, sex, insurance, patient_loc, region, ED_record, elective_admin) %>%
gg_miss_var()
ms1718 %>% select(age, amonth, aweekend, discwt, key_nis, race, tran_in, zipinc_qrtl, ms, hosp_bedsize, hosp_locteach, h_contrl, sex, insurance, patient_loc, region, ED_record, elective_admin, ms_f) %>% group_by(ms_f) %>% miss_var_summary()
# A tibble: 36 x 4
# Groups: ms_f [2]
ms_f variable n_miss pct_miss
<fct> <chr> <int> <dbl>
1 yes race 1711 2.76
2 yes zipinc_qrtl 774 1.25
3 yes tran_in 283 0.456
4 yes patient_loc 166 0.267
5 yes elective_admin 104 0.168
6 yes insurance 79 0.127
7 yes amonth 24 0.0387
8 yes sex 1 0.00161
9 yes age 0 0
10 yes aweekend 0 0
# … with 26 more rows
Most of the cases aren’t missing any data
# A tibble: 5 x 3
n_miss_in_case n_cases pct_cases
<int> <int> <dbl>
1 0 645918 94.6
2 1 32939 4.82
3 2 4003 0.586
4 3 107 0.0157
5 4 1 0.000146
1.5.1 missingness mechanism
MAR
1.6 selecting only variables I need
[1] "age, amonth, aweekend, discwt, key_nis, race, tran_in, year, zipinc_qrtl, ms, hosp_bedsize, hosp_locteach, hosp_region, h_contrl, rec_pneumonia, inv_gbs, invasive_enterobacteriaceae, dissem_tb, bacteremia_meningitis, disseminated_bartonella, legionella, m_aviuum, candida_severe, invasive_aspergillus, pcp, bartonella, crypto_extrapul, coccidioidomycosis, histoplasmosis, mucormycosis, cmv_pneumonia, cmv_pancreatitis, other_severe_cmv, cmv_other, ebv, hsv_encephalitis, varicella_systemic, zoster, rsv, hpn, hhv6_7, pml, enteroviral_meningitis, parvovirus, babesia, toxoplasma, visceral_leishmaniasis, acanthamoeba, naegleriasis, strongyloidiasis, taenia, flu, nosocomial, vap, catheter_inf, surgical_inf, bloodstream_catheter, c_diff, oi_type, depression, htn, migraine, hld, anxiety, copd, asthma, ibs, hashimoto, osteoporosis, dm, ra, fibromyalgia, cad, ibd, glaucoma, bipolar, epilepsy, pvd, ckd, lupus, psoriasis, scd, hiv, ckd1, ckd2, ckd3, ckd4, ckd5, esrd, ck_dother, hf, obese, kidney, oi, sex, insurance, patient_loc, region, ED_record, oi_f, ms_f, elective_admin"
ms1718 <- ms1718 %>% select( key_nis, year, ms, ms_f, oi, oi_f, discwt, age, sex, race, insurance, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, pvd, dm, kidney, copd, asthma, osteoporosis, ra, fibromyalgia, glaucoma, depression, anxiety, bipolar, epilepsy, migraine, ibd, ibs, hashimoto, lupus, psoriasis, scd, hiv, ED_record, elective_admin, tran_in, aweekend ,amonth, hosp_bedsize, hosp_locteach, h_contrl, oi_type, rec_pneumonia, inv_gbs, invasive_enterobacteriaceae, dissem_tb, bacteremia_meningitis, disseminated_bartonella, legionella, m_aviuum, candida_severe, invasive_aspergillus, pcp, bartonella, crypto_extrapul, coccidioidomycosis, histoplasmosis, mucormycosis, cmv_pneumonia, cmv_pancreatitis, other_severe_cmv, cmv_other, ebv, hsv_encephalitis, varicella_systemic, zoster, rsv, hpn, hhv6_7, pml, enteroviral_meningitis, parvovirus, babesia, toxoplasma, visceral_leishmaniasis, acanthamoeba, naegleriasis, strongyloidiasis, taenia, flu, nosocomial, vap, catheter_inf, surgical_inf, bloodstream_catheter, c_diff)
NOTE: i took out hosp_region
because it was redundant with region
.
1.7 Tidied Tibble
Our tibble ms1718
contains 682968 rows (patients) and 93 columns (variables). Each variable is contained in a column, and each row represents a single key_nis
. All variables now have appropriate types.
key_nis | year | ms | ms_f | oi | oi_f | discwt | age | sex | race | insurance | patient_loc | region | zipinc_qrtl | obese | htn | cad | hf | hld | pvd | dm | kidney | copd | asthma | osteoporosis | ra | fibromyalgia | glaucoma | depression | anxiety | bipolar | epilepsy | migraine | ibd | ibs | hashimoto | lupus | psoriasis | scd | hiv | ED_record | elective_admin | tran_in | aweekend | amonth | hosp_bedsize | hosp_locteach | h_contrl | oi_type | rec_pneumonia | inv_gbs | invasive_enterobacteriaceae | dissem_tb | bacteremia_meningitis | disseminated_bartonella | legionella | m_aviuum | candida_severe | invasive_aspergillus | pcp | bartonella | crypto_extrapul | coccidioidomycosis | histoplasmosis | mucormycosis | cmv_pneumonia | cmv_pancreatitis | other_severe_cmv | cmv_other | ebv | hsv_encephalitis | varicella_systemic | zoster | rsv | hpn | hhv6_7 | pml | enteroviral_meningitis | parvovirus | babesia | toxoplasma | visceral_leishmaniasis | acanthamoeba | naegleriasis | strongyloidiasis | taenia | flu | nosocomial | vap | catheter_inf | surgical_inf | bloodstream_catheter | c_diff |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10190580 | 2017 | 1 | yes | 1 | yes | 5.000222 | 45 | male | White | Medicare | Other | NewEngland | <48K | n | n | n | n | n | n | n | no | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | no | no | not_transferred | yes | 1 | small | rural | private_notprofit | bacteria | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10026133 | 2017 | 1 | yes | 1 | yes | 5.000000 | 41 | male | White | Medicare | Fringe | NewEngland | 82K+ | n | n | n | n | n | n | n | no | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | yes | no | not_transferred | no | 12 | large | urban_teaching | private_notprofit | bacteria | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
10031795 | 2017 | 1 | yes | 1 | yes | 5.000000 | 79 | male | White | Medicare | Fringe | NewEngland | 82K+ | n | n | n | y | y | n | n | stage3 | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | n | yes | no | acute_care | yes | 7 | large | urban_teaching | private_notprofit | bacteria | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10057572 | 2017 | 1 | yes | 1 | yes | 5.000000 | 38 | female | White | Medicare | metro>50K | NewEngland | 61-82K | n | n | n | n | n | n | n | no | n | n | n | n | n | n | y | n | n | n | y | n | y | n | n | n | n | n | yes | no | not_transferred | no | 3 | large | urban_teaching | private_notprofit | hospital | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
10058311 | 2017 | 1 | yes | 0 | no | 5.000000 | 54 | female | White | Medicare | Fringe | NewEngland | 82K+ | n | n | n | n | n | n | n | no | n | n | n | n | n | n | n | n | n | y | n | n | n | n | n | n | n | n | yes | no | not_transferred | no | 2 | large | urban_teaching | private_notprofit | none | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
10059466 | 2017 | 1 | yes | 0 | no | 5.000000 | 68 | male | White | Medicare | Fringe | NewEngland | 82K+ | y | n | n | y | y | n | n | no | n | n | n | n | n | n | y | y | n | n | y | n | n | n | n | n | n | n | no | yes | not_transferred | no | 12 | large | urban_teaching | private_notprofit | none | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
I have also saved the tidied tibble as an R data set
2 Code Book and Clean Data Summary
Sample Size The data in our complete
ms1718
sample consist of 682968 subjects from HCUP-NIS between the ages of 18 and 90.Missingness Of the 682968 subjects, 645918 have complete data on all variables listed below.
Our outcome variables is
oi
.
oi
is if the person had a diagnosis for an opportunistic infection: recurrent pneumonia, invasive group B strep, invasive_enterobacteriaceae, disseminated tuberculosis, bacteremia_meningitis, disseminated_bartonella, legionella, M_aviuum, severe candida, invasive_aspergillus, PCP, bartonella, extrapulmonary cryptococcus, coccidioidomycosis, Histoplasmosis, Mucormycosis, CMV_pneumonia, CMV_pancreatitis, other_severe_CMV, CMV_other, EBV, HSV_encephalitis, Varicella_systemic, zoster, RSV, HPN, HHV6_7, PML, Enteroviral_meningitis, Parvovirus, Babesia, toxoplasma, Visceral_leishmaniasis, Acanthamoeba, Naegleriasis, strongyloidiasis, Taenia, flu, nosocomial infections, VAP, catehter infections, surgical site infections, bloodstream catheter infections, c.diff
All other variables listed below will serve as candidate predictors for our models.
The other variable contained in my tidy tibble is
key_nis
which is the key_nis identifying code.
[1] "key_nis | year | ms | ms_f | oi | oi_f | discwt | age | sex | race | insurance | patient_loc | region | zipinc_qrtl | obese | htn | cad | hf | hld | pvd | dm | kidney | copd | asthma | osteoporosis | ra | fibromyalgia | glaucoma | depression | anxiety | bipolar | epilepsy | migraine | ibd | ibs | hashimoto | lupus | psoriasis | scd | hiv | ED_record | elective_admin | tran_in | aweekend | amonth | hosp_bedsize | hosp_locteach | h_contrl | oi_type | rec_pneumonia | inv_gbs | invasive_enterobacteriaceae | dissem_tb | bacteremia_meningitis | disseminated_bartonella | legionella | m_aviuum | candida_severe | invasive_aspergillus | pcp | bartonella | crypto_extrapul | coccidioidomycosis | histoplasmosis | mucormycosis | cmv_pneumonia | cmv_pancreatitis | other_severe_cmv | cmv_other | ebv | hsv_encephalitis | varicella_systemic | zoster | rsv | hpn | hhv6_7 | pml | enteroviral_meningitis | parvovirus | babesia | toxoplasma | visceral_leishmaniasis | acanthamoeba | naegleriasis | strongyloidiasis | taenia | flu | nosocomial | vap | catheter_inf | surgical_inf | bloodstream_catheter | c_diff"
tran_in n percent valid_percent
not_transferred 609181 0.891961263 0.89612457
acute_care 43186 0.063232831 0.06352798
other 27428 0.040160007 0.04034746
<NA> 3173 0.004645898 NA
Variable | Type | Description |
---|---|---|
ms | binary | Presence of ICD-10 code G35 in record |
age | quant | years |
amonth | quant | months 1-12 |
aweekend | binary | whether the patient was admitted on a weekend (yes/no) |
discwt | quant | discharge weight |
race | 6-cat | Black, White, Hispanic, Other, Asian, Native American |
tran_in | 3-cat | Indicator of a transfer into the hospital(Not transferred in, Transferred in from a different acute care hospital, Transferred in from another type of health facility) |
zipinc_qrtl | 4-cat | Median household income for patient’s ZIP Code (based on current year). Values include <48K, 48-61K, 61-82K, 82K+ |
hosp_bedsize | 3-cat | small, medium, larg |
hosp_locteach | 3-cat | Teaching Status of the hospital: rural, urban_nonteaching, Urban_teaching |
hosp_region | 4-cat | Northeast Midwest South West |
depression | binary | diagnosis of depression (presence of ICD-10 code F32 or F33) |
htn | binary | diagnosis of hypertension (presence of ICD-10 code I10) |
migraine | binary | diagnosis of migraine (presence of ICD-10 code G43) |
hld | binary | diagnosis of hyperlipidemia (presence of ICD-10 code E78) |
anxiety | binary | diagnosis of anxiety (presence of ICD-10 code F41) |
copd | diagnosis of COPD (presence of ICD-10 code J44) | |
asthma | binary | diagnosis of asthma (presence of ICD-10 code J45) |
ibs | binary | diagnosis of asthma (presence of ICD-10 code J45) |
hashimoto | binary | diagnosis of hashimoto (autoimmune thyroiditis) (presence of ICD-10 code E063) |
osteoporosis | binary | diagnosis of osteoporosis (presence of ICD-10 code M81) |
ra | binary | diagnosis of Rheumatoid Arthritis (presence of ICD-10 code M06) |
fibromyalgia | binary | diagnosis of fibromyalgia (presence of ICD-10 code (M797) |
cad | binary | diagnosis of coronary artery disease (presence of ICD-10 code (I25) |
ibd | binary | diagnosis of irritable bowel disease (presence of ICD-10 code (K51) |
glaucoma | binary | diagnosis of glaucoma (presence of ICD-10 code (H40) |
bipolar | binary | diagnosis of bipolar disorder (presence of ICD-10 code F31) |
epilepsy | binary | diagnosis of epilepsy (presence of ICD-10 code G40) |
pvd | binary | diagnosis of peripheral vascular disease (presence of ICD-10 code I73) |
lupus | binary | diagnosis of lupus (presence of ICD-10 code M32) |
psoriasis | binary | diagnosis of psoriasis (presence of ICD-10 code L40) |
scd | binary | diagnosis of sickle cell disease (presence of ICD-10 code D57) |
hiv | binary | diagnosis of hiv (presence of ICD-10 code B20) |
hf | binary | diagnosis of heart failure (presence of ICD-10 code I50) |
sex | binary | male, female. |
insurance | 5-cat | expected primary payer (Medicare, Medicaid, private insurance, self pay, other) |
patient_loc | 6-cat | Patient Location (“Central” counties of metro areas of >=1 million population, “Fringe” counties of metro areas of >=1 million population, Counties in metro areas of 250,000-999,999 population, Counties in metro areas of 50,000-249,999 population, Micropolitan counties, Not metropolitan or micropolitan counties) |
region | ||
ED_record | binary | records that have evidence of emergency department (ED) services reported on the HCUP record (yes/no) |
elective_admin | binary | indicates whether the admission to the hospital was elective |
3 Table 1
3.0.1 Table one
This table has the covariates that I will be adjusting for as I explore the relationship between ms
and oi
.
c("key_nis", "year", "ms", "ms_f", "oi", "oi_f", "discwt", "age",
"sex", "race", "insurance", "patient_loc", "region", "zipinc_qrtl",
"obese", "htn", "cad", "hf", "hld", "pvd", "dm", "kidney", "copd",
"asthma", "osteoporosis", "ra", "fibromyalgia", "glaucoma", "depression",
"anxiety", "bipolar", "epilepsy", "migraine", "ibd", "ibs", "hashimoto",
"lupus", "psoriasis", "scd", "hiv", "ED_record", "elective_admin",
"tran_in", "aweekend", "amonth", "hosp_bedsize", "hosp_locteach",
"h_contrl", "oi_type", "rec_pneumonia", "inv_gbs", "invasive_enterobacteriaceae",
"dissem_tb", "bacteremia_meningitis", "disseminated_bartonella",
"legionella", "m_aviuum", "candida_severe", "invasive_aspergillus",
"pcp", "bartonella", "crypto_extrapul", "coccidioidomycosis",
"histoplasmosis", "mucormycosis", "cmv_pneumonia", "cmv_pancreatitis",
"other_severe_cmv", "cmv_other", "ebv", "hsv_encephalitis", "varicella_systemic",
"zoster", "rsv", "hpn", "hhv6_7", "pml", "enteroviral_meningitis",
"parvovirus", "babesia", "toxoplasma", "visceral_leishmaniasis",
"acanthamoeba", "naegleriasis", "strongyloidiasis", "taenia",
"flu", "nosocomial", "vap", "catheter_inf", "surgical_inf", "bloodstream_catheter",
"c_diff")
This table has the covariates that I will be calculating propensity scores with.
vars <- c("age", "sex", "race", "insurance", "patient_loc", "region", "zipinc_qrtl",
"obese", "htn", "cad", "hf", "hld", "pvd", "dm", "kidney",
"copd", "asthma", "osteoporosis", "ra", "fibromyalgia", "glaucoma",
"depression", "anxiety", "bipolar", "epilepsy", "migraine", "ibd",
"ibs", "hashimoto", "lupus", "psoriasis", "scd", "hiv", "ED_record",
"elective_admin", "tran_in", "aweekend", "amonth", "hosp_bedsize",
"hosp_locteach", "h_contrl")
factorvars <- c( "sex", "race", "insurance", "patient_loc", "region", "zipinc_qrtl",
"obese", "htn", "cad", "hf", "hld", "pvd", "kidney",
"copd", "asthma", "osteoporosis", "ra", "fibromyalgia", "glaucoma",
"depression", "anxiety", "bipolar", "epilepsy", "migraine", "ibd",
"ibs", "hashimoto", "lupus", "psoriasis", "scd", "hiv", "ED_record",
"elective_admin", "tran_in", "aweekend", "hosp_bedsize",
"hosp_locteach", "h_contrl")
trt <- c("ms_f")
table01 <- CreateTableOne(data = ms1718,
vars = vars,
factorVars= factorvars,
strata = trt)
print(table01, verbose=TRUE)
Stratified by ms_f
no yes p test
n 620880 62088
age (mean (SD)) 58.08 (20.26) 56.66 (14.61) <0.001
sex = female (%) 357621 (57.6) 44868 (72.3) <0.001
race (%) <0.001
White 404266 (67.1) 45319 (75.1)
Black 91201 (15.1) 9732 (16.1)
Hispanic 68467 (11.4) 3492 ( 5.8)
Other 18247 ( 3.0) 1256 ( 2.1)
Asian 16618 ( 2.8) 360 ( 0.6)
NativeA 3712 ( 0.6) 218 ( 0.4)
insurance (%) <0.001
Medicare 296863 (47.9) 36928 (59.6)
Private 164550 (26.5) 14465 (23.3)
Medicaid 114577 (18.5) 8248 (13.3)
Self_pay 24575 ( 4.0) 1059 ( 1.7)
Other 19387 ( 3.1) 1309 ( 2.1)
patient_loc (%) <0.001
Central 182195 (29.5) 17582 (28.4)
Fringe 148787 (24.1) 16824 (27.2)
metro>250K 127713 (20.7) 12752 (20.6)
metro>50K 57799 ( 9.4) 5738 ( 9.3)
micro 57452 ( 9.3) 5283 ( 8.5)
Other 43725 ( 7.1) 3743 ( 6.0)
region (%) <0.001
South_Atlantic 129882 (20.9) 11790 (19.0)
EastNorth_Central 95758 (15.4) 12146 (19.6)
Middle_Atlantic 86651 (14.0) 9927 (16.0)
Pacific 82325 (13.3) 6825 (11.0)
WestSouth_Central 72182 (11.6) 5189 ( 8.4)
WestNorth_Central 43295 ( 7.0) 4613 ( 7.4)
EastSouth_Central 43133 ( 6.9) 3697 ( 6.0)
Mountain 38346 ( 6.2) 4268 ( 6.9)
NewEngland 29308 ( 4.7) 3633 ( 5.9)
zipinc_qrtl (%) <0.001
<48K 183929 (30.2) 15828 (25.8)
48-61K 163060 (26.7) 16249 (26.5)
61-82K 144270 (23.7) 15544 (25.4)
82K+ 118723 (19.5) 13693 (22.3)
obese = y (%) 101297 (16.3) 10452 (16.8) 0.001
htn = y (%) 201284 (32.4) 23122 (37.2) <0.001
cad = y (%) 126759 (20.4) 7950 (12.8) <0.001
hf = y (%) 107298 (17.3) 6100 ( 9.8) <0.001
hld = y (%) 204382 (32.9) 17724 (28.5) <0.001
pvd = y (%) 17660 ( 2.8) 1583 ( 2.5) <0.001
dm = y (%) 167089 (26.9) 13152 (21.2) <0.001
kidney (%) <0.001
CKDother 21891 ( 3.5) 1559 ( 2.5)
no 516105 (83.1) 56152 (90.4)
stage1_2 5711 ( 0.9) 413 ( 0.7)
stage3 42535 ( 6.9) 2512 ( 4.0)
stage4 11727 ( 1.9) 561 ( 0.9)
stage5_ESR 22911 ( 3.7) 891 ( 1.4)
copd = y (%) 94370 (15.2) 7675 (12.4) <0.001
asthma = y (%) 43863 ( 7.1) 5161 ( 8.3) <0.001
osteoporosis = y (%) 17100 ( 2.8) 3125 ( 5.0) <0.001
ra = y (%) 11486 ( 1.8) 1233 ( 2.0) 0.018
fibromyalgia = y (%) 9323 ( 1.5) 2637 ( 4.2) <0.001
glaucoma = y (%) 9815 ( 1.6) 998 ( 1.6) 0.625
depression = y (%) 85454 (13.8) 15975 (25.7) <0.001
anxiety = y (%) 85689 (13.8) 12746 (20.5) <0.001
bipolar = y (%) 19969 ( 3.2) 2616 ( 4.2) <0.001
epilepsy = y (%) 23038 ( 3.7) 5819 ( 9.4) <0.001
migraine = y (%) 11547 ( 1.9) 3283 ( 5.3) <0.001
ibd = y (%) 2523 ( 0.4) 256 ( 0.4) 0.850
ibs = y (%) 6348 ( 1.0) 1035 ( 1.7) <0.001
hashimoto = y (%) 954 ( 0.2) 176 ( 0.3) <0.001
lupus = y (%) 3731 ( 0.6) 712 ( 1.1) <0.001
psoriasis = y (%) 3249 ( 0.5) 450 ( 0.7) <0.001
scd = y (%) 3251 ( 0.5) 179 ( 0.3) <0.001
hiv = y (%) 2456 ( 0.4) 80 ( 0.1) <0.001
ED_record = yes (%) 374744 (60.4) 44384 (71.5) <0.001
elective_admin = yes (%) 144804 (23.4) 9461 (15.3) <0.001
tran_in (%) <0.001
not_transferred 555216 (89.8) 53965 (87.3)
acute_care 39194 ( 6.3) 3992 ( 6.5)
other 23580 ( 3.8) 3848 ( 6.2)
aweekend = yes (%) 128047 (20.6) 13602 (21.9) <0.001
amonth (mean (SD)) 6.48 (3.46) 6.46 (3.46) 0.125
hosp_bedsize (%) 0.081
small 128612 (20.7) 13065 (21.0)
medium 182753 (29.4) 18329 (29.5)
large 309515 (49.9) 30694 (49.4)
hosp_locteach (%) <0.001
rural 56708 ( 9.1) 5070 ( 8.2)
urban_nonteaching 135839 (21.9) 13266 (21.4)
urban_teaching 428333 (69.0) 43752 (70.5)
h_contrl (%) <0.001
gov_nonfed 71258 (11.5) 6010 ( 9.7)
private_notprofit 456378 (73.5) 48394 (77.9)
Private_profit 93244 (15.0) 7684 (12.4)
4 Dealing with missingness
once again here is my missing. I tried imputation with mice and simputation. It is too much for R to handle, so I just have to do complete cases.
ms1718 %>% select(age, amonth, aweekend, discwt, key_nis, race, tran_in, zipinc_qrtl, ms, hosp_bedsize, hosp_locteach, h_contrl, sex, insurance, patient_loc, region, ED_record, elective_admin) %>%
gg_miss_var()
# set.seed(0527)
# ms1718_ms <- ms1718 %>% filter(ms_f == "yes") %>%
# data.frame() %>%
# impute_cart(., tran_in ~ .) %>%
# impute_cart(., elective_admin ~ .) %>%
# impute_cart(., sex ~ .) %>%
# impute_cart(., race ~ .) %>%
# impute_cart(., amonth ~ .) %>%
# impute_cart(., zipinc_qrtl ~ .) %>%
# impute_cart(., insurance ~ .) %>%
# impute_cart(., patient_loc ~ .) %>%
# tbl_df()
Below I am just checking to make sure that I have no more missing
[1] 0
no more missing!
[1] 645918 93
The ms1718c
data contain 645918 rows and 93 columns.
5 Unadjusted analysis
2 by 2 table analysis:
------------------------------------------------------
Outcome : no
Comparing : no vs. yes
no yes P(no) 95% conf. interval
no 490138 96613 0.8353 NA NA
yes 41433 17734 0.7003 0.6966 0.704
95% conf. interval
Relative Risk: 1.1929 1.1865 1.1993
Sample Odds Ratio: 2.1714 2.1308 2.2128
Probability difference: 0.1351 0.1313 0.1389
Asymptotic P-value: 0.0000
------------------------------------------------------
|
|
|
unadjust_binary_outcome <- glm(oi ~ ms_f, data = ms1718c, family = binomial())
unadjust_binary_outcome_tidy <- tidy(unadjust_binary_outcome, conf.int = TRUE, conf.level = 0.95, exponentiate = TRUE) %>%
filter(term == "ms_f")
unadjust_binary_outcome
Call: glm(formula = oi ~ ms_f, family = binomial(), data = ms1718c)
Coefficients:
(Intercept) ms_fyes
-1.6240 0.7754
Degrees of Freedom: 645917 Total (i.e. Null); 645916 Residual
Null Deviance: 603100
Residual Deviance: 597200 AIC: 597200
The odds of having an in MS individuals was (95%CI: , ) times higher than the odds that a non-MS control had an oi
6 Splitting the data
I am splitting the singly imputed ms1718c
sample into a training (90% of the data) and testing sample (10% of the data). I am using the function strata to ensure that both data sets have an equal proportion of my main predictor of interest, ms
, and the outcome, oi
set.seed(2)
ms_split <- rsample::initial_split(ms1718c, prop = 0.9,
strata = ms, oi)
ms_train <- rsample::training(ms_split)
ms_test <- rsample::testing(ms_split)
[1] 645918 93
[1] 581327 93
[1] 64591 93
NOTE: I actually ended up building the model with the ‘test’ sample because the training sample was just too big (I even tried building it with a training sample that was 70% of the size)
7 logistic regression
7.1 Model 1
7.1.1 Fitting Model 1
[1] "key_nis + year + ms + ms_f + oi + oi_f + discwt + age + sex + race + insurance + patient_loc + region + zipinc_qrtl + obese + htn + cad + hf + hld + pvd + dm + kidney + copd + asthma + osteoporosis + ra + fibromyalgia + glaucoma + depression + anxiety + bipolar + epilepsy + migraine + ibd + ibs + hashimoto + lupus + psoriasis + scd + hiv + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + oi_type + rec_pneumonia + inv_gbs + invasive_enterobacteriaceae + dissem_tb + bacteremia_meningitis + disseminated_bartonella + legionella + m_aviuum + candida_severe + invasive_aspergillus + pcp + bartonella + crypto_extrapul + coccidioidomycosis + histoplasmosis + mucormycosis + cmv_pneumonia + cmv_pancreatitis + other_severe_cmv + cmv_other + ebv + hsv_encephalitis + varicella_systemic + zoster + rsv + hpn + hhv6_7 + pml + enteroviral_meningitis + parvovirus + babesia + toxoplasma + visceral_leishmaniasis + acanthamoeba + naegleriasis + strongyloidiasis + taenia + flu + nosocomial + vap + catheter_inf + surgical_inf + bloodstream_catheter + c_diff"
I first tried:
mod1
predicts the log odds ofoi
using the predictorsyear
,age
,sex
,race
,insurance
,patient_loc
,region
,zipinc_qrtl
,obese
,htn
,cad
,hf
,hld
,pvd
,dm
,kidney
,copd
,asthma
,osteoporosis
,ra
,fibromyalgia
,glaucoma
,depression
,anxiety
,bipolar
,epilepsy
,migraine
,ibd
,ibs
,hashimoto
,lupus
,psoriasis
,scd
,hiv
,ED_record
,elective_admin
,tran_in,
aweekend
,amonth
,hosp_bedsize
,hosp_locteach
,h_contrl
But then this was too much for it to handle, so I simplified it to only contain:
7.1.2 Effect sizes model 1
Effects Response : oi
Factor Low High Diff. Effect
age 42 74 32 0.4901400
Odds Ratio 42 74 32 1.6325000
amonth 3 10 7 -0.1112600
Odds Ratio 3 10 7 0.8947100
ms_f - yes:no 1 2 NA 0.7501800
Odds Ratio 1 2 NA 2.1174000
sex - male:female 2 1 NA 0.0865680
Odds Ratio 2 1 NA 1.0904000
race - Black:White 1 2 NA -0.0751920
Odds Ratio 1 2 NA 0.9275700
race - Hispanic:White 1 3 NA 0.0229700
Odds Ratio 1 3 NA 1.0232000
race - Other:White 1 4 NA 0.0183500
Odds Ratio 1 4 NA 1.0185000
race - Asian:White 1 5 NA -0.0495640
Odds Ratio 1 5 NA 0.9516400
race - NativeA:White 1 6 NA 0.2041200
Odds Ratio 1 6 NA 1.2264000
insurance - Private:Medicare 1 2 NA -0.2592100
Odds Ratio 1 2 NA 0.7716600
insurance - Medicaid:Medicare 1 3 NA -0.1801700
Odds Ratio 1 3 NA 0.8351300
insurance - Self_pay:Medicare 1 4 NA -0.2475000
Odds Ratio 1 4 NA 0.7807500
insurance - Other:Medicare 1 5 NA -0.2250100
Odds Ratio 1 5 NA 0.7985000
patient_loc - Fringe:Central 1 2 NA -0.0031994
Odds Ratio 1 2 NA 0.9968100
patient_loc - metro>250K:Central 1 3 NA 0.0436060
Odds Ratio 1 3 NA 1.0446000
patient_loc - metro>50K:Central 1 4 NA 0.0330390
Odds Ratio 1 4 NA 1.0336000
patient_loc - micro:Central 1 5 NA 0.0681130
Odds Ratio 1 5 NA 1.0705000
patient_loc - Other:Central 1 6 NA 0.0642700
Odds Ratio 1 6 NA 1.0664000
region - EastNorth_Central:South_Atlantic 1 2 NA 0.0077239
Odds Ratio 1 2 NA 1.0078000
region - Middle_Atlantic:South_Atlantic 1 3 NA -0.1483400
Odds Ratio 1 3 NA 0.8621400
region - Pacific:South_Atlantic 1 4 NA 0.1482900
Odds Ratio 1 4 NA 1.1598000
region - WestSouth_Central:South_Atlantic 1 5 NA 0.1579500
Odds Ratio 1 5 NA 1.1711000
region - WestNorth_Central:South_Atlantic 1 6 NA -0.0111580
Odds Ratio 1 6 NA 0.9889000
region - EastSouth_Central:South_Atlantic 1 7 NA 0.0389670
Odds Ratio 1 7 NA 1.0397000
region - Mountain:South_Atlantic 1 8 NA 0.0580440
Odds Ratio 1 8 NA 1.0598000
region - NewEngland:South_Atlantic 1 9 NA -0.0515370
Odds Ratio 1 9 NA 0.9497700
zipinc_qrtl - 48-61K:<48K 1 2 NA -0.0062378
Odds Ratio 1 2 NA 0.9937800
zipinc_qrtl - 61-82K:<48K 1 3 NA -0.0069961
Odds Ratio 1 3 NA 0.9930300
zipinc_qrtl - 82K+:<48K 1 4 NA -0.0345080
Odds Ratio 1 4 NA 0.9660800
ED_record - no:yes 2 1 NA -0.6351200
Odds Ratio 2 1 NA 0.5298700
elective_admin - yes:no 1 2 NA -1.0106000
Odds Ratio 1 2 NA 0.3639900
tran_in - acute_care:not_transferred 1 2 NA 0.4042000
Odds Ratio 1 2 NA 1.4981000
tran_in - other:not_transferred 1 3 NA 0.5856900
Odds Ratio 1 3 NA 1.7962000
aweekend - yes:no 1 2 NA 0.0925620
Odds Ratio 1 2 NA 1.0970000
hosp_bedsize - small:large 3 1 NA 0.0684640
Odds Ratio 3 1 NA 1.0709000
hosp_bedsize - medium:large 3 2 NA 0.0528850
Odds Ratio 3 2 NA 1.0543000
hosp_locteach - rural:urban_teaching 3 1 NA 0.0704920
Odds Ratio 3 1 NA 1.0730000
hosp_locteach - urban_nonteaching:urban_teaching 3 2 NA 0.0048300
Odds Ratio 3 2 NA 1.0048000
h_contrl - gov_nonfed:private_notprofit 2 1 NA -0.0804590
Odds Ratio 2 1 NA 0.9226900
h_contrl - Private_profit:private_notprofit 2 3 NA -0.1517900
Odds Ratio 2 3 NA 0.8591700
S.E. Lower 0.95 Upper 0.95
0.025417 0.440320 0.5399600
NA 1.553200 1.7159000
0.021402 -0.153210 -0.0693130
NA 0.857950 0.9330300
0.032694 0.686100 0.8142600
NA 1.986000 2.2575000
0.021660 0.044116 0.1290200
NA 1.045100 1.1377000
0.033278 -0.140410 -0.0099689
NA 0.869000 0.9900800
0.039054 -0.053574 0.0995140
NA 0.947840 1.1046000
0.068266 -0.115450 0.1521500
NA 0.890970 1.1643000
0.074132 -0.194860 0.0957320
NA 0.822950 1.1005000
0.141010 -0.072260 0.4804900
NA 0.930290 1.6169000
0.033010 -0.323910 -0.1945100
NA 0.723320 0.8232400
0.038839 -0.256290 -0.1040400
NA 0.773920 0.9011900
0.065481 -0.375840 -0.1191600
NA 0.686710 0.8876600
0.070722 -0.363630 -0.0864010
NA 0.695150 0.9172300
0.031291 -0.064529 0.0581300
NA 0.937510 1.0599000
0.031915 -0.018947 0.1061600
NA 0.981230 1.1120000
0.041804 -0.048895 0.1149700
NA 0.952280 1.1218000
0.052767 -0.035309 0.1715300
NA 0.965310 1.1871000
0.054242 -0.042042 0.1705800
NA 0.958830 1.1860000
0.036406 -0.063631 0.0790790
NA 0.938350 1.0823000
0.038383 -0.223570 -0.0731100
NA 0.799660 0.9295000
0.039743 0.070395 0.2261800
NA 1.072900 1.2538000
0.040250 0.079061 0.2368400
NA 1.082300 1.2672000
0.049720 -0.108610 0.0862930
NA 0.897080 1.0901000
0.046897 -0.052950 0.1308800
NA 0.948430 1.1398000
0.050926 -0.041770 0.1578600
NA 0.959090 1.1710000
0.054508 -0.158370 0.0552970
NA 0.853530 1.0569000
0.029069 -0.063212 0.0507360
NA 0.938740 1.0520000
0.031671 -0.069070 0.0550770
NA 0.933260 1.0566000
0.035980 -0.105030 0.0360110
NA 0.900300 1.0367000
0.034337 -0.702420 -0.5678200
NA 0.495390 0.5667600
0.047958 -1.104600 -0.9166500
NA 0.331330 0.3998600
0.048278 0.309580 0.4988200
NA 1.362800 1.6468000
0.048301 0.491030 0.6803600
NA 1.634000 1.9746000
0.024865 0.043828 0.1413000
NA 1.044800 1.1518000
0.028562 0.012483 0.1244500
NA 1.012600 1.1325000
0.025283 0.003332 0.1024400
NA 1.003300 1.1079000
0.051022 -0.029510 0.1704900
NA 0.970920 1.1859000
0.027152 -0.048387 0.0580470
NA 0.952770 1.0598000
0.035963 -0.150950 -0.0099720
NA 0.859890 0.9900800
0.032864 -0.216200 -0.0873780
NA 0.805570 0.9163300
7.2 Evaluating predictive punch
We can see from the spearman rho plot below that the most predictive punch is in ED_record
, elective_admin
, age
, insurance
, ms_f
, kidney
, dm
spear_mod1 <- spearman2(oi ~ ms_f + age + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + hf + kidney + dm + copd + ibd + obese + hiv + scd,
data = ms1718c)
plot(spear_mod1)
obese
, hiv
, scd
don’t seem to help us out that much
Possibilities for nonlinear terms:
interaction between
ED_record
andelective_admin
interaction between
ED_record
andage
restricted cubic spline with 4 knots on
age
7.3 model 2
I’m going to add in some of the predictors that seemed helpful
mod2_lrm <- lrm(oi ~ ms_f + age + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + hf + kidney + dm + copd + ibd, data=ms_test, x = TRUE, y = TRUE)
Logistic Regression Model
lrm(formula = oi ~ ms_f + age + sex + race + insurance + patient_loc +
region + zipinc_qrtl + ED_record + elective_admin + tran_in +
aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl +
hf + kidney + dm + copd + ibd, data = ms_test, x = TRUE,
y = TRUE)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 64591 LR chi2 5451.60 R2 0.132 C 0.703
0 52879 d.f. 49 g 1.022 Dxy 0.405
1 11712 Pr(> chi2) <0.0001 gr 2.780 gamma 0.405
max |deriv| 6e-11 gp 0.124 tau-a 0.120
Brier 0.137
Coef S.E. Wald Z Pr(>|Z|)
Intercept -2.4439 0.1148 -21.29 <0.0001
ms_f=yes 0.7908 0.0329 24.00 <0.0001
age 0.0142 0.0008 17.28 <0.0001
sex=female -0.0652 0.0218 -2.99 0.0028
race=Black -0.1330 0.0338 -3.94 <0.0001
race=Hispanic -0.0157 0.0394 -0.40 0.6896
race=Other -0.0042 0.0685 -0.06 0.9509
race=Asian -0.0908 0.0746 -1.22 0.2237
race=NativeA 0.1615 0.1414 1.14 0.2535
insurance=Private -0.2138 0.0332 -6.44 <0.0001
insurance=Medicaid -0.1370 0.0389 -3.52 0.0004
insurance=Self_pay -0.1787 0.0658 -2.72 0.0066
insurance=Other -0.1770 0.0709 -2.50 0.0126
patient_loc=Fringe -0.0093 0.0314 -0.30 0.7665
patient_loc=metro>250K 0.0374 0.0320 1.17 0.2421
patient_loc=metro>50K 0.0305 0.0419 0.73 0.4674
patient_loc=micro 0.0578 0.0529 1.09 0.2744
patient_loc=Other 0.0536 0.0543 0.99 0.3238
region=EastNorth_Central -0.0060 0.0365 -0.17 0.8689
region=Middle_Atlantic -0.1446 0.0385 -3.76 0.0002
region=Pacific 0.1440 0.0398 3.61 0.0003
region=WestSouth_Central 0.1532 0.0404 3.80 0.0001
region=WestNorth_Central -0.0130 0.0498 -0.26 0.7935
region=EastSouth_Central 0.0345 0.0470 0.73 0.4627
region=Mountain 0.0660 0.0510 1.29 0.1959
region=NewEngland -0.0459 0.0546 -0.84 0.4008
zipinc_qrtl=48-61K -0.0020 0.0291 -0.07 0.9441
zipinc_qrtl=61-82K 0.0050 0.0318 0.16 0.8742
zipinc_qrtl=82K+ -0.0109 0.0362 -0.30 0.7621
ED_record=yes 0.6166 0.0344 17.90 <0.0001
elective_admin=yes -0.9886 0.0481 -20.56 <0.0001
tran_in=acute_care 0.3925 0.0484 8.11 <0.0001
tran_in=other 0.5756 0.0484 11.89 <0.0001
aweekend=yes 0.0962 0.0249 3.86 0.0001
amonth -0.0157 0.0031 -5.12 <0.0001
hosp_bedsize=medium -0.0183 0.0305 -0.60 0.5485
hosp_bedsize=large -0.0699 0.0286 -2.44 0.0146
hosp_locteach=urban_nonteaching -0.0772 0.0547 -1.41 0.1584
hosp_locteach=urban_teaching -0.0807 0.0512 -1.58 0.1149
h_contrl=private_notprofit 0.0716 0.0360 1.99 0.0468
h_contrl=Private_profit -0.0764 0.0446 -1.71 0.0868
hf=y -0.0163 0.0284 -0.57 0.5657
kidney=no -0.2357 0.0512 -4.60 <0.0001
kidney=stage1_2 -0.0810 0.1071 -0.76 0.4495
kidney=stage3 -0.0542 0.0603 -0.90 0.3690
kidney=stage4 -0.1815 0.0844 -2.15 0.0315
kidney=stage5_ESR 0.1973 0.0686 2.88 0.0040
dm=y 0.1889 0.0238 7.93 <0.0001
copd=y 0.0872 0.0283 3.08 0.0021
ibd=y 0.4656 0.1501 3.10 0.0019
7.3.2 model 3
mod3_lrm <- lrm(oi ~ ms_f + rcs(age,4) + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + ED_record*elective_admin + age%ia%ED_record + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + hf + kidney + age%ia%kidney + dm + age%ia%dm + copd + age%ia%copd + ibd, data=ms_test, x = TRUE, y = TRUE)
Logistic Regression Model
lrm(formula = oi ~ ms_f + rcs(age, 4) + sex + race + insurance +
patient_loc + region + zipinc_qrtl + ED_record + elective_admin +
ED_record * elective_admin + age %ia% ED_record + tran_in +
aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl +
hf + kidney + age %ia% kidney + dm + age %ia% dm + copd +
age %ia% copd + ibd, data = ms_test, x = TRUE, y = TRUE)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 64591 LR chi2 5692.69 R2 0.138 C 0.704
0 52879 d.f. 60 g 1.114 Dxy 0.409
1 11712 Pr(> chi2) <0.0001 gr 3.046 gamma 0.409
max |deriv| 6e-11 gp 0.124 tau-a 0.121
Brier 0.137
Coef S.E. Wald Z Pr(>|Z|)
Intercept -3.2122 0.3075 -10.45 <0.0001
ms_f=yes 0.7273 0.0337 21.56 <0.0001
age 0.0351 0.0049 7.15 <0.0001
age' -0.0275 0.0065 -4.24 <0.0001
age'' 0.0926 0.0281 3.30 0.0010
sex=female -0.0276 0.0220 -1.26 0.2092
race=Black -0.1422 0.0339 -4.20 <0.0001
race=Hispanic -0.0007 0.0396 -0.02 0.9859
race=Other 0.0088 0.0686 0.13 0.8979
race=Asian -0.0596 0.0747 -0.80 0.4251
race=NativeA 0.1595 0.1420 1.12 0.2612
insurance=Private -0.2345 0.0338 -6.94 <0.0001
insurance=Medicaid -0.1538 0.0394 -3.90 <0.0001
insurance=Self_pay -0.2261 0.0666 -3.40 0.0007
insurance=Other -0.1977 0.0713 -2.77 0.0055
patient_loc=Fringe -0.0136 0.0314 -0.43 0.6644
patient_loc=metro>250K 0.0396 0.0320 1.24 0.2163
patient_loc=metro>50K 0.0350 0.0419 0.83 0.4039
patient_loc=micro 0.0579 0.0529 1.09 0.2737
patient_loc=Other 0.0358 0.0545 0.66 0.5113
region=EastNorth_Central -0.0164 0.0366 -0.45 0.6546
region=Middle_Atlantic -0.1436 0.0385 -3.73 0.0002
region=Pacific 0.1453 0.0399 3.65 0.0003
region=WestSouth_Central 0.1449 0.0404 3.59 0.0003
region=WestNorth_Central -0.0191 0.0498 -0.38 0.7012
region=EastSouth_Central 0.0383 0.0470 0.81 0.4156
region=Mountain 0.0614 0.0511 1.20 0.2289
region=NewEngland -0.0440 0.0545 -0.81 0.4202
zipinc_qrtl=48-61K 0.0028 0.0292 0.09 0.9247
zipinc_qrtl=61-82K 0.0177 0.0318 0.56 0.5775
zipinc_qrtl=82K+ 0.0121 0.0362 0.33 0.7391
ED_record=yes 1.4181 0.1005 14.11 <0.0001
elective_admin=yes -1.1104 0.0519 -21.39 <0.0001
age * ED_record=yes -0.0142 0.0015 -9.36 <0.0001
tran_in=acute_care 0.3143 0.0490 6.41 <0.0001
tran_in=other 0.5691 0.0483 11.77 <0.0001
aweekend=yes 0.1016 0.0249 4.07 <0.0001
amonth -0.0157 0.0031 -5.12 <0.0001
hosp_bedsize=medium -0.0130 0.0305 -0.42 0.6710
hosp_bedsize=large -0.0666 0.0286 -2.32 0.0201
hosp_locteach=urban_nonteaching -0.0700 0.0548 -1.28 0.2014
hosp_locteach=urban_teaching -0.0737 0.0513 -1.44 0.1505
h_contrl=private_notprofit 0.0833 0.0361 2.31 0.0210
h_contrl=Private_profit -0.0741 0.0446 -1.66 0.0968
hf=y -0.0020 0.0282 -0.07 0.9449
kidney=no -0.7654 0.2627 -2.91 0.0036
kidney=stage1_2 -0.1828 0.5679 -0.32 0.7476
kidney=stage3 0.4818 0.3293 1.46 0.1435
kidney=stage4 -0.6040 0.4701 -1.28 0.1989
kidney=stage5_ESR 0.1099 0.3346 0.33 0.7425
age * kidney=no 0.0074 0.0036 2.05 0.0408
age * kidney=stage1_2 0.0014 0.0079 0.17 0.8622
age * kidney=stage3 -0.0069 0.0045 -1.53 0.1253
age * kidney=stage4 0.0061 0.0063 0.97 0.3343
age * kidney=stage5_ESR 0.0001 0.0048 0.01 0.9914
dm=y 0.3776 0.1068 3.54 0.0004
age * dm=y -0.0034 0.0016 -2.19 0.0286
copd=y 0.2415 0.1579 1.53 0.1262
age * copd=y -0.0026 0.0022 -1.15 0.2495
ibd=y 0.4527 0.1502 3.01 0.0026
ED_record=yes * elective_admin=yes 0.6610 0.1354 4.88 <0.0001
index.orig training test optimism index.corrected n
Dxy 0.4090 0.4121 0.4070 0.0051 0.4039 40
R2 0.1378 0.1390 0.1365 0.0025 0.1354 40
Intercept 0.0000 0.0000 -0.0167 0.0167 -0.0167 40
Slope 1.0000 1.0000 0.9863 0.0137 0.9863 40
Emax 0.0000 0.0000 0.0060 0.0060 0.0060 40
D 0.0881 0.0889 0.0872 0.0016 0.0865 40
U 0.0000 0.0000 0.0000 0.0000 0.0000 40
Q 0.0881 0.0889 0.0872 0.0017 0.0865 40
B 0.1369 0.1367 0.1371 -0.0004 0.1373 40
g 1.1138 1.1181 1.1029 0.0153 1.0986 40
gp 0.1239 0.1245 0.1234 0.0011 0.1228 40
7.3.3 tidied coefficients
Among hospitalized patients in 2017-2018 (N=64,591), after adjusting for age + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl,mod1
predicts that the odds of having an OI in those with MS is 2.11 (95% CI 1.98, 2.26) times those without MS
- given that the 95% CI is entirely above 1, the model suggests that having a MS is associated with a higher odds of an oi.
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 0.072 | 0.102 | -25.856 | 0.000 |
ms_fyes | 2.117 | 0.033 | 22.946 | 0.000 |
age | 1.015 | 0.001 | 19.284 | 0.000 |
sexfemale | 0.917 | 0.022 | -3.997 | 0.000 |
raceBlack | 0.928 | 0.033 | -2.260 | 0.024 |
raceHispanic | 1.023 | 0.039 | 0.588 | 0.556 |
raceOther | 1.019 | 0.068 | 0.269 | 0.788 |
raceAsian | 0.952 | 0.074 | -0.669 | 0.504 |
raceNativeA | 1.226 | 0.141 | 1.448 | 0.148 |
insurancePrivate | 0.772 | 0.033 | -7.853 | 0.000 |
insuranceMedicaid | 0.835 | 0.039 | -4.639 | 0.000 |
insuranceSelf_pay | 0.781 | 0.065 | -3.780 | 0.000 |
insuranceOther | 0.799 | 0.071 | -3.182 | 0.001 |
patient_locFringe | 0.997 | 0.031 | -0.102 | 0.919 |
patient_locmetro>250K | 1.045 | 0.032 | 1.366 | 0.172 |
patient_locmetro>50K | 1.034 | 0.042 | 0.790 | 0.429 |
patient_locmicro | 1.070 | 0.053 | 1.291 | 0.197 |
patient_locOther | 1.066 | 0.054 | 1.185 | 0.236 |
regionEastNorth_Central | 1.008 | 0.036 | 0.212 | 0.832 |
regionMiddle_Atlantic | 0.862 | 0.038 | -3.865 | 0.000 |
regionPacific | 1.160 | 0.040 | 3.731 | 0.000 |
regionWestSouth_Central | 1.171 | 0.040 | 3.924 | 0.000 |
regionWestNorth_Central | 0.989 | 0.050 | -0.224 | 0.822 |
regionEastSouth_Central | 1.040 | 0.047 | 0.831 | 0.406 |
regionMountain | 1.060 | 0.051 | 1.140 | 0.254 |
regionNewEngland | 0.950 | 0.055 | -0.945 | 0.344 |
zipinc_qrtl48-61K | 0.994 | 0.029 | -0.215 | 0.830 |
zipinc_qrtl61-82K | 0.993 | 0.032 | -0.221 | 0.825 |
zipinc_qrtl82K+ | 0.966 | 0.036 | -0.959 | 0.338 |
ED_recordyes | 1.887 | 0.034 | 18.497 | 0.000 |
elective_adminyes | 0.364 | 0.048 | -21.074 | 0.000 |
tran_inacute_care | 1.498 | 0.048 | 8.372 | 0.000 |
tran_inother | 1.796 | 0.048 | 12.126 | 0.000 |
aweekendyes | 1.097 | 0.025 | 3.723 | 0.000 |
amonth | 0.984 | 0.003 | -5.199 | 0.000 |
hosp_bedsizemedium | 0.985 | 0.030 | -0.512 | 0.609 |
hosp_bedsizelarge | 0.934 | 0.029 | -2.397 | 0.017 |
hosp_locteachurban_nonteaching | 0.936 | 0.055 | -1.203 | 0.229 |
hosp_locteachurban_teaching | 0.932 | 0.051 | -1.382 | 0.167 |
h_contrlprivate_notprofit | 1.084 | 0.036 | 2.237 | 0.025 |
h_contrlPrivate_profit | 0.931 | 0.045 | -1.602 | 0.109 |
[1] 2.257577
[1] 1.985971
7.3.4 mod2
The OR for ms_f with mod2
(when we add in hf + kidney + dm + copd + ibd), is considerably lower
mod2_glm <- glm(oi ~ ms_f + age + sex + race + insurance + patient_loc + region + zipinc_qrtl + ED_record + elective_admin + tran_in + aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl + hf + kidney + dm + copd + ibd, data=ms_test)
tidy(mod2_glm, exponentiate=TRUE) %>% kable(digits=3)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1.099 | 0.016 | 5.909 | 0.000 |
ms_fyes | 1.132 | 0.005 | 23.628 | 0.000 |
age | 1.002 | 0.000 | 14.926 | 0.000 |
sexfemale | 0.999 | 0.003 | -0.484 | 0.628 |
raceBlack | 0.981 | 0.005 | -4.331 | 0.000 |
raceHispanic | 0.999 | 0.005 | -0.186 | 0.852 |
raceOther | 1.000 | 0.009 | 0.037 | 0.970 |
raceAsian | 0.993 | 0.010 | -0.697 | 0.486 |
raceNativeA | 1.019 | 0.020 | 0.926 | 0.354 |
insurancePrivate | 0.971 | 0.004 | -6.614 | 0.000 |
insuranceMedicaid | 0.978 | 0.005 | -4.253 | 0.000 |
insuranceSelf_pay | 0.965 | 0.009 | -4.123 | 0.000 |
insuranceOther | 0.973 | 0.009 | -3.049 | 0.002 |
patient_locFringe | 0.999 | 0.004 | -0.285 | 0.776 |
patient_locmetro>250K | 1.006 | 0.004 | 1.291 | 0.197 |
patient_locmetro>50K | 1.005 | 0.006 | 0.829 | 0.407 |
patient_locmicro | 1.008 | 0.007 | 1.137 | 0.255 |
patient_locOther | 1.005 | 0.007 | 0.649 | 0.517 |
regionEastNorth_Central | 0.998 | 0.005 | -0.305 | 0.760 |
regionMiddle_Atlantic | 0.981 | 0.005 | -3.612 | 0.000 |
regionPacific | 1.021 | 0.006 | 3.730 | 0.000 |
regionWestSouth_Central | 1.021 | 0.006 | 3.738 | 0.000 |
regionWestNorth_Central | 0.999 | 0.007 | -0.158 | 0.874 |
regionEastSouth_Central | 1.006 | 0.006 | 0.909 | 0.363 |
regionMountain | 1.009 | 0.007 | 1.241 | 0.215 |
regionNewEngland | 0.995 | 0.008 | -0.675 | 0.500 |
zipinc_qrtl48-61K | 1.000 | 0.004 | 0.038 | 0.970 |
zipinc_qrtl61-82K | 1.002 | 0.004 | 0.428 | 0.669 |
zipinc_qrtl82K+ | 1.001 | 0.005 | 0.206 | 0.837 |
ED_recordyes | 1.084 | 0.004 | 18.639 | 0.000 |
elective_adminyes | 0.927 | 0.005 | -15.854 | 0.000 |
tran_inacute_care | 1.044 | 0.007 | 6.497 | 0.000 |
tran_inother | 1.097 | 0.008 | 12.123 | 0.000 |
aweekendyes | 1.016 | 0.004 | 4.445 | 0.000 |
amonth | 0.998 | 0.000 | -5.192 | 0.000 |
hosp_bedsizemedium | 0.999 | 0.004 | -0.291 | 0.771 |
hosp_bedsizelarge | 0.991 | 0.004 | -2.237 | 0.025 |
hosp_locteachurban_nonteaching | 0.989 | 0.007 | -1.518 | 0.129 |
hosp_locteachurban_teaching | 0.988 | 0.007 | -1.767 | 0.077 |
h_contrlprivate_notprofit | 1.011 | 0.005 | 2.250 | 0.024 |
h_contrlPrivate_profit | 0.989 | 0.006 | -1.899 | 0.058 |
hfy | 1.001 | 0.004 | 0.242 | 0.809 |
kidneyno | 0.959 | 0.008 | -5.156 | 0.000 |
kidneystage1_2 | 0.985 | 0.017 | -0.913 | 0.361 |
kidneystage3 | 0.993 | 0.010 | -0.694 | 0.487 |
kidneystage4 | 0.971 | 0.013 | -2.200 | 0.028 |
kidneystage5_ESR | 1.031 | 0.011 | 2.797 | 0.005 |
dmy | 1.026 | 0.004 | 7.200 | 0.000 |
copdy | 1.013 | 0.004 | 2.961 | 0.003 |
ibdy | 1.073 | 0.023 | 3.014 | 0.003 |
7.3.5 key fit summary statistics
Below are the key fit summary statistics like the Nagelkerke R-square and the area under the ROC curve as they are presented in the lrm output for mod1
The r square is very low (0.127) as well as the C statistic (0.698).
Logistic Regression Model
lrm(formula = oi ~ ms_f + age + sex + race + insurance + patient_loc +
region + zipinc_qrtl + ED_record + elective_admin + tran_in +
aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl,
data = ms_test, x = TRUE, y = TRUE)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 64591 LR chi2 5236.53 R2 0.127 C 0.698
0 52879 d.f. 40 g 1.003 Dxy 0.395
1 11712 Pr(> chi2) <0.0001 gr 2.725 gamma 0.396
max |deriv| 3e-12 gp 0.121 tau-a 0.117
Brier 0.138
Coef S.E. Wald Z Pr(>|Z|)
Intercept -2.6341 0.1019 -25.86 <0.0001
ms_f=yes 0.7502 0.0327 22.95 <0.0001
age 0.0153 0.0008 19.28 <0.0001
sex=female -0.0866 0.0217 -4.00 <0.0001
race=Black -0.0752 0.0333 -2.26 0.0239
race=Hispanic 0.0230 0.0391 0.59 0.5564
race=Other 0.0184 0.0683 0.27 0.7881
race=Asian -0.0496 0.0741 -0.67 0.5038
race=NativeA 0.2041 0.1410 1.45 0.1478
insurance=Private -0.2592 0.0330 -7.85 <0.0001
insurance=Medicaid -0.1802 0.0388 -4.64 <0.0001
insurance=Self_pay -0.2475 0.0655 -3.78 0.0002
insurance=Other -0.2250 0.0707 -3.18 0.0015
patient_loc=Fringe -0.0032 0.0313 -0.10 0.9186
patient_loc=metro>250K 0.0436 0.0319 1.37 0.1718
patient_loc=metro>50K 0.0330 0.0418 0.79 0.4293
patient_loc=micro 0.0681 0.0528 1.29 0.1968
patient_loc=Other 0.0643 0.0542 1.18 0.2361
region=EastNorth_Central 0.0077 0.0364 0.21 0.8320
region=Middle_Atlantic -0.1483 0.0384 -3.86 0.0001
region=Pacific 0.1483 0.0397 3.73 0.0002
region=WestSouth_Central 0.1579 0.0402 3.92 <0.0001
region=WestNorth_Central -0.0112 0.0497 -0.22 0.8224
region=EastSouth_Central 0.0390 0.0469 0.83 0.4060
region=Mountain 0.0580 0.0509 1.14 0.2544
region=NewEngland -0.0515 0.0545 -0.95 0.3444
zipinc_qrtl=48-61K -0.0062 0.0291 -0.21 0.8301
zipinc_qrtl=61-82K -0.0070 0.0317 -0.22 0.8252
zipinc_qrtl=82K+ -0.0345 0.0360 -0.96 0.3375
ED_record=yes 0.6351 0.0343 18.50 <0.0001
elective_admin=yes -1.0106 0.0480 -21.07 <0.0001
tran_in=acute_care 0.4042 0.0483 8.37 <0.0001
tran_in=other 0.5857 0.0483 12.13 <0.0001
aweekend=yes 0.0926 0.0249 3.72 0.0002
amonth -0.0159 0.0031 -5.20 <0.0001
hosp_bedsize=medium -0.0156 0.0305 -0.51 0.6090
hosp_bedsize=large -0.0685 0.0286 -2.40 0.0165
hosp_locteach=urban_nonteaching -0.0657 0.0546 -1.20 0.2289
hosp_locteach=urban_teaching -0.0705 0.0510 -1.38 0.1671
h_contrl=private_notprofit 0.0805 0.0360 2.24 0.0253
h_contrl=Private_profit -0.0713 0.0445 -1.60 0.1093
Below are the key fit summary statistics like the Nagelkerke R-square and the area under the ROC curve as they are presented in the lrm output for mod1
The r square is very low (0.132) as well as the C statistic (0.703).
Logistic Regression Model
lrm(formula = oi ~ ms_f + age + sex + race + insurance + patient_loc +
region + zipinc_qrtl + ED_record + elective_admin + tran_in +
aweekend + amonth + hosp_bedsize + hosp_locteach + h_contrl +
hf + kidney + dm + copd + ibd, data = ms_test, x = TRUE,
y = TRUE)
Model Likelihood Discrimination Rank Discrim.
Ratio Test Indexes Indexes
Obs 64591 LR chi2 5451.60 R2 0.132 C 0.703
0 52879 d.f. 49 g 1.022 Dxy 0.405
1 11712 Pr(> chi2) <0.0001 gr 2.780 gamma 0.405
max |deriv| 6e-11 gp 0.124 tau-a 0.120
Brier 0.137
Coef S.E. Wald Z Pr(>|Z|)
Intercept -2.4439 0.1148 -21.29 <0.0001
ms_f=yes 0.7908 0.0329 24.00 <0.0001
age 0.0142 0.0008 17.28 <0.0001
sex=female -0.0652 0.0218 -2.99 0.0028
race=Black -0.1330 0.0338 -3.94 <0.0001
race=Hispanic -0.0157 0.0394 -0.40 0.6896
race=Other -0.0042 0.0685 -0.06 0.9509
race=Asian -0.0908 0.0746 -1.22 0.2237
race=NativeA 0.1615 0.1414 1.14 0.2535
insurance=Private -0.2138 0.0332 -6.44 <0.0001
insurance=Medicaid -0.1370 0.0389 -3.52 0.0004
insurance=Self_pay -0.1787 0.0658 -2.72 0.0066
insurance=Other -0.1770 0.0709 -2.50 0.0126
patient_loc=Fringe -0.0093 0.0314 -0.30 0.7665
patient_loc=metro>250K 0.0374 0.0320 1.17 0.2421
patient_loc=metro>50K 0.0305 0.0419 0.73 0.4674
patient_loc=micro 0.0578 0.0529 1.09 0.2744
patient_loc=Other 0.0536 0.0543 0.99 0.3238
region=EastNorth_Central -0.0060 0.0365 -0.17 0.8689
region=Middle_Atlantic -0.1446 0.0385 -3.76 0.0002
region=Pacific 0.1440 0.0398 3.61 0.0003
region=WestSouth_Central 0.1532 0.0404 3.80 0.0001
region=WestNorth_Central -0.0130 0.0498 -0.26 0.7935
region=EastSouth_Central 0.0345 0.0470 0.73 0.4627
region=Mountain 0.0660 0.0510 1.29 0.1959
region=NewEngland -0.0459 0.0546 -0.84 0.4008
zipinc_qrtl=48-61K -0.0020 0.0291 -0.07 0.9441
zipinc_qrtl=61-82K 0.0050 0.0318 0.16 0.8742
zipinc_qrtl=82K+ -0.0109 0.0362 -0.30 0.7621
ED_record=yes 0.6166 0.0344 17.90 <0.0001
elective_admin=yes -0.9886 0.0481 -20.56 <0.0001
tran_in=acute_care 0.3925 0.0484 8.11 <0.0001
tran_in=other 0.5756 0.0484 11.89 <0.0001
aweekend=yes 0.0962 0.0249 3.86 0.0001
amonth -0.0157 0.0031 -5.12 <0.0001
hosp_bedsize=medium -0.0183 0.0305 -0.60 0.5485
hosp_bedsize=large -0.0699 0.0286 -2.44 0.0146
hosp_locteach=urban_nonteaching -0.0772 0.0547 -1.41 0.1584
hosp_locteach=urban_teaching -0.0807 0.0512 -1.58 0.1149
h_contrl=private_notprofit 0.0716 0.0360 1.99 0.0468
h_contrl=Private_profit -0.0764 0.0446 -1.71 0.0868
hf=y -0.0163 0.0284 -0.57 0.5657
kidney=no -0.2357 0.0512 -4.60 <0.0001
kidney=stage1_2 -0.0810 0.1071 -0.76 0.4495
kidney=stage3 -0.0542 0.0603 -0.90 0.3690
kidney=stage4 -0.1815 0.0844 -2.15 0.0315
kidney=stage5_ESR 0.1973 0.0686 2.88 0.0040
dm=y 0.1889 0.0238 7.93 <0.0001
copd=y 0.0872 0.0283 3.08 0.0021
ibd=y 0.4656 0.1501 3.10 0.0019
8 Fitting the propensity score model
I will now fit the propensity score, which predicts MS status based on these 30 available covariates:
discwt
, age
, sex
, race
, insurance
, patient_loc
, region
, zipinc_qrtl
, obese
,htn
, cad
, hf
, hld
, pvd
, dm2
, dm1
, kidney
, copd
, asthma
, osteoporosis
, ra
, fibromyalgia
, glaucoma
, depression
, anxiety
, bipolar
, epilepsy
, migraine
, ibd
, ibs
, hashimoto
, lupus
, psoriasis
, scd
, hiv
, ED_record
, elective_admin
, tran_in
, aweekend
, amonth
, hosp_bedsize
, hosp_locteach
, h_contrl
[1] "key_nis, year, ms, ms_f, oi, oi_f, discwt, age, sex, race, insurance, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, pvd, dm, kidney, copd, asthma, osteoporosis, ra, fibromyalgia, glaucoma, depression, anxiety, bipolar, epilepsy, migraine, ibd, ibs, hashimoto, lupus, psoriasis, scd, hiv, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl, oi_type, rec_pneumonia, inv_gbs, invasive_enterobacteriaceae, dissem_tb, bacteremia_meningitis, disseminated_bartonella, legionella, m_aviuum, candida_severe, invasive_aspergillus, pcp, bartonella, crypto_extrapul, coccidioidomycosis, histoplasmosis, mucormycosis, cmv_pneumonia, cmv_pancreatitis, other_severe_cmv, cmv_other, ebv, hsv_encephalitis, varicella_systemic, zoster, rsv, hpn, hhv6_7, pml, enteroviral_meningitis, parvovirus, babesia, toxoplasma, visceral_leishmaniasis, acanthamoeba, naegleriasis, strongyloidiasis, taenia, flu, nosocomial, vap, catheter_inf, surgical_inf, bloodstream_catheter, c_diff"
We’ll use the f.build
tool from the cobalt
package here.
ms_test <- ms_test %>%
mutate(treat = as.logical(ms_f == "yes"))
covs_1 <- ms_test %>%
select(discwt, age, sex, race, insurance, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, pvd, dm, kidney, copd, asthma, osteoporosis, ra, fibromyalgia, glaucoma, depression, anxiety, bipolar, epilepsy, migraine, ibd, ibs, hashimoto, lupus, psoriasis, scd, hiv, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl)
prop_model <- glm(f.build("treat", covs_1), data = ms_test,
family = binomial)
tidy(prop_model, conf.int = TRUE) %>%
select(term, estimate, std.error, conf.low, conf.high, p.value) %>%
knitr::kable(digits = 3)
term | estimate | std.error | conf.low | conf.high | p.value |
---|---|---|---|---|---|
(Intercept) | -194.057 | 755.968 | -1698.768 | 1254.074 | 0.797 |
discwt | 38.451 | 151.194 | -251.177 | 339.394 | 0.799 |
age | -0.018 | 0.001 | -0.020 | -0.016 | 0.000 |
sexfemale | 0.573 | 0.033 | 0.509 | 0.637 | 0.000 |
raceBlack | 0.126 | 0.044 | 0.039 | 0.212 | 0.004 |
raceHispanic | -0.609 | 0.062 | -0.732 | -0.489 | 0.000 |
raceOther | -0.391 | 0.099 | -0.590 | -0.202 | 0.000 |
raceAsian | -1.311 | 0.158 | -1.636 | -1.013 | 0.000 |
raceNativeA | -0.839 | 0.270 | -1.412 | -0.346 | 0.002 |
insurancePrivate | -0.868 | 0.044 | -0.955 | -0.782 | 0.000 |
insuranceMedicaid | -1.170 | 0.053 | -1.274 | -1.067 | 0.000 |
insuranceSelf_pay | -1.550 | 0.110 | -1.771 | -1.340 | 0.000 |
insuranceOther | -0.978 | 0.101 | -1.181 | -0.783 | 0.000 |
patient_locFringe | 0.010 | 0.041 | -0.070 | 0.090 | 0.802 |
patient_locmetro>250K | -0.092 | 0.044 | -0.178 | -0.006 | 0.036 |
patient_locmetro>50K | -0.081 | 0.057 | -0.193 | 0.030 | 0.156 |
patient_locmicro | -0.061 | 0.072 | -0.203 | 0.079 | 0.397 |
patient_locOther | -0.051 | 0.076 | -0.201 | 0.096 | 0.500 |
regionEastNorth_Central | 0.321 | 0.048 | 0.228 | 0.415 | 0.000 |
regionMiddle_Atlantic | 0.249 | 0.050 | 0.152 | 0.346 | 0.000 |
regionPacific | 0.087 | 0.057 | -0.024 | 0.198 | 0.123 |
regionWestSouth_Central | -0.058 | 0.060 | -0.176 | 0.058 | 0.329 |
regionWestNorth_Central | 0.024 | 0.067 | -0.107 | 0.153 | 0.717 |
regionEastSouth_Central | -0.096 | 0.068 | -0.229 | 0.035 | 0.156 |
regionMountain | 0.263 | 0.067 | 0.130 | 0.394 | 0.000 |
regionNewEngland | 0.117 | 0.072 | -0.024 | 0.257 | 0.101 |
zipinc_qrtl48-61K | 0.186 | 0.041 | 0.106 | 0.265 | 0.000 |
zipinc_qrtl61-82K | 0.216 | 0.044 | 0.131 | 0.302 | 0.000 |
zipinc_qrtl82K+ | 0.256 | 0.049 | 0.160 | 0.351 | 0.000 |
obesey | 0.131 | 0.039 | 0.053 | 0.207 | 0.001 |
htny | 0.011 | 0.036 | -0.060 | 0.082 | 0.765 |
cady | -0.253 | 0.046 | -0.344 | -0.163 | 0.000 |
hfy | -0.538 | 0.054 | -0.646 | -0.432 | 0.000 |
hldy | -0.154 | 0.036 | -0.226 | -0.083 | 0.000 |
pvdy | -0.031 | 0.095 | -0.221 | 0.151 | 0.744 |
dmy | -0.138 | 0.038 | -0.212 | -0.064 | 0.000 |
kidneyno | 0.130 | 0.087 | -0.038 | 0.304 | 0.136 |
kidneystage1_2 | -0.187 | 0.189 | -0.569 | 0.173 | 0.323 |
kidneystage3 | -0.307 | 0.107 | -0.516 | -0.095 | 0.004 |
kidneystage4 | -0.532 | 0.170 | -0.875 | -0.206 | 0.002 |
kidneystage5_ESR | -0.961 | 0.144 | -1.248 | -0.681 | 0.000 |
copdy | -0.336 | 0.046 | -0.427 | -0.246 | 0.000 |
asthmay | -0.085 | 0.053 | -0.190 | 0.019 | 0.112 |
osteoporosisy | 0.236 | 0.073 | 0.091 | 0.377 | 0.001 |
ray | -0.304 | 0.108 | -0.522 | -0.097 | 0.005 |
fibromyalgiay | 0.682 | 0.079 | 0.525 | 0.836 | 0.000 |
glaucomay | -0.019 | 0.113 | -0.246 | 0.198 | 0.869 |
depressiony | 0.657 | 0.037 | 0.584 | 0.728 | 0.000 |
anxietyy | -0.059 | 0.040 | -0.138 | 0.020 | 0.144 |
bipolary | 0.039 | 0.074 | -0.108 | 0.183 | 0.595 |
epilepsyy | 0.667 | 0.055 | 0.559 | 0.773 | 0.000 |
migrainey | 0.560 | 0.073 | 0.415 | 0.703 | 0.000 |
ibdy | -0.143 | 0.221 | -0.601 | 0.269 | 0.517 |
ibsy | 0.128 | 0.113 | -0.097 | 0.344 | 0.257 |
hashimotoy | -0.147 | 0.327 | -0.835 | 0.455 | 0.653 |
lupusy | 0.170 | 0.149 | -0.130 | 0.454 | 0.254 |
psoriasisy | -0.019 | 0.180 | -0.388 | 0.321 | 0.915 |
scdy | -0.576 | 0.205 | -0.997 | -0.192 | 0.005 |
hivy | -0.643 | 0.335 | -1.365 | -0.040 | 0.055 |
ED_recordyes | 0.602 | 0.044 | 0.516 | 0.689 | 0.000 |
elective_adminyes | -0.266 | 0.052 | -0.368 | -0.164 | 0.000 |
tran_inacute_care | 0.280 | 0.066 | 0.149 | 0.408 | 0.000 |
tran_inother | 0.539 | 0.064 | 0.412 | 0.663 | 0.000 |
aweekendyes | -0.017 | 0.035 | -0.085 | 0.051 | 0.630 |
amonth | -0.008 | 0.004 | -0.016 | 0.001 | 0.068 |
hosp_bedsizemedium | 0.041 | 0.042 | -0.040 | 0.123 | 0.323 |
hosp_bedsizelarge | 0.084 | 0.039 | 0.009 | 0.160 | 0.030 |
hosp_locteachurban_nonteaching | -0.154 | 0.075 | -0.301 | -0.006 | 0.042 |
hosp_locteachurban_teaching | -0.058 | 0.070 | -0.195 | 0.080 | 0.411 |
h_contrlprivate_notprofit | 0.078 | 0.050 | -0.020 | 0.178 | 0.120 |
h_contrlPrivate_profit | -0.036 | 0.063 | -0.160 | 0.088 | 0.567 |
# A tibble: 1 x 8
null.deviance df.null logLik AIC BIC deviance df.residual nobs
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 38892. 64590 -17531. 35204. 35848. 35062. 64520 64591
From table 1, I will remove variables that have <5% in a category
# msc <- ms1718c %>%
# mutate(race_c =fct_collapse(factor(race),
# Other = c("Other", "NativeA","Asian"))) %>%
# mutate(insurance_c =fct_collapse(factor(insurance),
# Other = c("Other", "Self_pay"))) %>%
# mutate(CKD =fct_collapse(factor(kidney),
# mod_sev = c("stage3", "stage4", "stage5_ESR"),
# other = c("stage1_2", "CKDother"))) %>%
# select( -pvd, -race, -insurance, -dm1, -kidney, -ra, -glaucoma, -bipolar, -ibd, -ibs, - hashimoto, -psoriasis, -scd, -hiv,- lupus) %>%
#
# select(key_nis, year, ms, ms_f, oi, oi_f, discwt, age, race_c, sex, insurance_c, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, dm2, CKD, copd, asthma, osteoporosis, fibromyalgia, depression, anxiety, epilepsy, migraine, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl, oi_type, rec_pneumonia, inv_gbs, invasive_enterobacteriaceae, dissem_tb, bacteremia_meningitis, disseminated_bartonella, legionella, m_aviuum, candida_severe, invasive_aspergillus, pcp, bartonella, crypto_extrapul, coccidioidomycosis, histoplasmosis, mucormycosis, cmv_pneumonia, cmv_pancreatitis, other_severe_cmv, cmv_other, ebv, hsv_encephalitis, varicella_systemic, zoster, rsv, hpn, hhv6_7, pml, enteroviral_meningitis, parvovirus, babesia, toxoplasma, visceral_leishmaniasis, acanthamoeba, naegleriasis, strongyloidiasis, taenia, flu, nosocomial, vap, catheter_inf, surgical_inf, bloodstream_catheter, c_diff)
# set.seed(1)
# ms_split <- rsample::initial_split(msc, prop = 0.7,
# strata = oi_f, ms_f)
# ms_train <- rsample::training(ms_split)
# ms_test <- rsample::testing(ms_split)
# msc <- msc %>%
# mutate(treat = as.logical(ms == "yes"))
#
# covs_1 <- msc %>%
# select(discwt, age, race_c, sex, insurance_c, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, dm2, CKD, copd, asthma, osteoporosis, fibromyalgia, depression, anxiety, epilepsy, migraine, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl)
#
# prop_model <- glm(f.build("treat", covs_1), data = msc,
# family = binomial)
#
# tidy(prop_model, conf.int = TRUE) %>%
# select(term, estimate, std.error, conf.low, conf.high, p.value) %>%
# knitr::kable(digits = 3)
8.0.1 Storing the Propensity Scores
ms_test <- ms_test %>%
mutate(ps = prop_model$fitted,
linps = prop_model$linear.predictors)
ggplot(ms_test, aes(x = ms_f, y = linps)) +
geom_violin() +
geom_boxplot(width = 0.3)
The density plot below shows us that we have a substantial number of MS patients who do not have overlapping PS scores with non-MS patients.
9 match_4
Caliper Matching (1:1 without replacement) with the Matching
package
I will use the
Match
function to specify a caliper of 0.2 (fromMatching
package)Here, subjects will only be matched if they fall within the maximum distance of 0.2 standard deviations of the logit of the propensity score.
If they do not fall within this range, those subjects will be dropped.
The matching design will be 1:1 without replacement
match_4 <- Match(Tr = ms_test$treat, X = ms_test$linps,
M = 1, replace = FALSE, ties = FALSE,
caliper = 0.2, estimand = "ATT")
summary(match_4)
Estimate... 0
SE......... 0
T-stat..... NaN
p.val...... NA
Original number of observations.............. 64591
Original number of treated obs............... 5772
Matched number of observations............... 5763
Matched number of observations (unweighted). 5763
Caliper (SDs)........................................ 0.2
Number of obs dropped by 'exact' or 'caliper' 9
Note that we have now dropped 9 of the MS=yes subjects, and reduced our sample to the 5763 remaining MS=yes subjects, who are paired with 5763 unique matched MS = no subjects in our matched sample.
9.1 Obtaining the Matched Sample
Here I am storing the matched sample
match4_matches <- factor(rep(match_4$index.treated, 2))
ms_matched4 <- cbind(match4_matches,
ms_test[c(match_4$index.control,
match_4$index.treated),])
How many unique subjects are in our matched sample?
[1] 11521
This match includes 5763 pairs so 11521 subjects, since we’ve done matching without replacement.
ms_f n
1 no 5763
2 yes 5763
9.2 Checking Covariate Balance for our 1:1 Caliper Match
9.2.1 Using bal.tab
to obtain a balance table
covs_4plus <- ms_test %>%
select(discwt, age, sex, race, insurance, patient_loc, region, zipinc_qrtl, obese, htn, cad, hf, hld, pvd, dm, kidney, copd, asthma, osteoporosis, ra, fibromyalgia, glaucoma, depression, anxiety, bipolar, epilepsy, migraine, ibd, ibs, hashimoto, lupus, psoriasis, scd, hiv, ED_record, elective_admin, tran_in, aweekend, amonth, hosp_bedsize, hosp_locteach, h_contrl, ps, linps)
bal4 <- bal.tab(match_4,
treat = ms_test$ms_f,
covs = covs_4plus, quick = FALSE,
data = ms_test, stats = c("m", "v"),
un = TRUE, disp.v.ratio = TRUE)
bal4
Balance Measures
Type Diff.Un V.Ratio.Un Diff.Adj V.Ratio.Adj
discwt Contin. 0.0074 0.8282 0.0079 0.8901
age Contin. -0.1163 0.5196 0.0314 0.5044
sex_female Binary 0.1448 . 0.0066 .
race_White Binary 0.0683 . -0.0017 .
race_Black Binary 0.0141 . -0.0014 .
race_Hispanic Binary -0.0518 . 0.0019 .
race_Other Binary -0.0081 . 0.0010 .
race_Asian Binary -0.0194 . -0.0005 .
race_NativeA Binary -0.0031 . 0.0007 .
insurance_Medicare Binary 0.1099 . 0.0163 .
insurance_Private Binary -0.0310 . -0.0111 .
insurance_Medicaid Binary -0.0477 . -0.0024 .
insurance_Self_pay Binary -0.0211 . -0.0036 .
insurance_Other Binary -0.0101 . 0.0009 .
patient_loc_Central Binary -0.0021 . -0.0028 .
patient_loc_Fringe Binary 0.0322 . 0.0000 .
patient_loc_metro>250K Binary -0.0149 . -0.0005 .
patient_loc_metro>50K Binary -0.0026 . -0.0045 .
patient_loc_micro Binary -0.0033 . 0.0016 .
patient_loc_Other Binary -0.0093 . 0.0062 .
region_South_Atlantic Binary -0.0201 . 0.0064 .
region_EastNorth_Central Binary 0.0401 . -0.0059 .
region_Middle_Atlantic Binary 0.0329 . -0.0026 .
region_Pacific Binary -0.0197 . 0.0000 .
region_WestSouth_Central Binary -0.0338 . 0.0009 .
region_WestNorth_Central Binary -0.0002 . 0.0052 .
region_EastSouth_Central Binary -0.0126 . -0.0003 .
region_Mountain Binary 0.0053 . -0.0009 .
region_NewEngland Binary 0.0082 . -0.0028 .
zipinc_qrtl_<48K Binary -0.0522 . 0.0029 .
zipinc_qrtl_48-61K Binary 0.0057 . -0.0035 .
zipinc_qrtl_61-82K Binary 0.0198 . 0.0045 .
zipinc_qrtl_82K+ Binary 0.0266 . -0.0040 .
obese_y Binary 0.0129 . 0.0026 .
htn_y Binary 0.0463 . 0.0010 .
cad_y Binary -0.0776 . 0.0012 .
hf_y Binary -0.0827 . -0.0016 .
hld_y Binary -0.0519 . -0.0028 .
pvd_y Binary -0.0055 . -0.0024 .
dm_y Binary -0.0537 . -0.0040 .
kidney_CKDother Binary -0.0069 . -0.0002 .
kidney_no Binary 0.0736 . -0.0036 .
kidney_stage1_2 Binary -0.0033 . 0.0000 .
kidney_stage3 Binary -0.0282 . 0.0035 .
kidney_stage4 Binary -0.0107 . 0.0002 .
kidney_stage5_ESR Binary -0.0244 . 0.0002 .
copd_y Binary -0.0362 . -0.0023 .
asthma_y Binary 0.0146 . -0.0026 .
osteoporosis_y Binary 0.0171 . 0.0017 .
ra_y Binary -0.0007 . -0.0017 .
fibromyalgia_y Binary 0.0302 . -0.0012 .
glaucoma_y Binary 0.0015 . 0.0014 .
depression_y Binary 0.1337 . -0.0064 .
anxiety_y Binary 0.0623 . -0.0078 .
bipolar_y Binary 0.0102 . -0.0049 .
epilepsy_y Binary 0.0531 . -0.0038 .
migraine_y Binary 0.0317 . -0.0002 .
ibd_y Binary 0.0002 . 0.0007 .
ibs_y Binary 0.0090 . -0.0014 .
hashimoto_y Binary 0.0006 . -0.0003 .
lupus_y Binary 0.0047 . -0.0007 .
psoriasis_y Binary 0.0012 . -0.0012 .
scd_y Binary -0.0004 . -0.0019 .
hiv_y Binary -0.0016 . -0.0003 .
ED_record_yes Binary 0.1115 . -0.0052 .
elective_admin_yes Binary -0.0825 . 0.0040 .
tran_in_not_transferred Binary -0.0208 . 0.0003 .
tran_in_acute_care Binary -0.0018 . -0.0007 .
tran_in_other Binary 0.0226 . 0.0003 .
aweekend_yes Binary 0.0174 . -0.0095 .
amonth Contin. -0.0182 1.0038 0.0065 1.0094
hosp_bedsize_small Binary -0.0057 . -0.0104 .
hosp_bedsize_medium Binary -0.0029 . 0.0082 .
hosp_bedsize_large Binary 0.0087 . 0.0023 .
hosp_locteach_rural Binary -0.0008 . 0.0040 .
hosp_locteach_urban_nonteaching Binary -0.0180 . -0.0026 .
hosp_locteach_urban_teaching Binary 0.0189 . -0.0014 .
h_contrl_gov_nonfed Binary -0.0182 . -0.0049 .
h_contrl_private_notprofit Binary 0.0482 . 0.0024 .
h_contrl_Private_profit Binary -0.0301 . 0.0024 .
ps Contin. 0.6431 2.7834 0.0001 0.9999
linps Contin. 0.8364 1.0691 0.0001 0.9999
Sample sizes
no yes
All 58819 5772
Matched 5763 5763
Unmatched 53056 0
Discarded 0 9
9.2.2 Checking Rubin’s Rules 1 and 2
Below is a table of Rubin’s Rules 1 & 2 before and after our 1:1 caliper match is applied
covs_for_rubin <- ms_test %>%
select(linps)
rubin_m4 <- bal.tab(match_4,
treat = ms_test$treat,
covs = covs_for_rubin,
data = ms_test, stats = c("m", "v"),
un = TRUE, disp.v.ratio = TRUE)[1]
rubin_report_m4 <- tibble(
status = c("Rule1", "Rule2"),
Unmatched = c(rubin_m4$Balance$Diff.Un,
rubin_m4$Balance$V.Ratio.Un),
Matched = c(rubin_m4$Balance$Diff.Adj,
rubin_m4$Balance$V.Ratio.Adj))
rubin_report_m4 %>% knitr::kable(digits = 2)
status | Unmatched | Matched |
---|---|---|
Rule1 | 0.84 | 0 |
Rule2 | 1.07 | 1 |
- Rubin Rule 1: We want the standardized differences of the propensity scores of the two groups to be as close to zero as possible.
- Before matching we have a bias of 83.637686%, and this is reduced to 0.008155% after 1:1 caliper matching.
- Rubin Rule 2: we want the variances of the ratio of the lin PS to be within 4/5 to 5/4.
- Before matching we have a variance ratio of 106.9120203%, and this becomes 99.9921849% after 1:1 caliper matching.
In summary, we can see that caliper matching produced an exceptionally strong match. Rubin’s Rule 1 is 0 and Rule 2 is essentially 1. However, this cost us 9 patients in the MS group who were the “hardest to match”, which is quite sad.
9.2.3 Using bal.plot
from cobalt
We can graphically compare the distribution of PS in both groups with mirrored histograms, before and after matching
bal.plot(match_4,
treat = ms_test$ms_f,
covs = covs_4plus,
var.name = "ps",
which = "both",
sample.names =
c("Unmatched Sample", "match_4 Sample"),
type = "histogram", mirror = TRUE)
After matching, the mirrored histogram looks the same! So cool!!
9.2.4 Using love.plot
to look at Standardized Differences
The love plot below helps us to look at the balance of each of the covariates.
We can see that we got all of the covariates in the desired range of -10% to +10%, except for age. However its a whole lot better than it was before.
9.2.5 Using love.plot
to look at Variance Ratios
We only had three continuous variables.
Most importantly, we can visually see that we have met Rubin’s Rule #2 because the variance of the linear PS=1