Prevalence of AIDS defining illness in HIV patients with Substance Abuse


Lindsay Petrenchik


January 13, 2023

library(knitr); library(rmdformats)

## Global options
library(janitor); library(naniar)
# mice = multiple imputation through chained equations
library(broom); library(yardstick)
# library(cutpointr)
# library(OptimalCutpoints)


It is estimated that almost have of people living with HIV in the US have a substance use disorder. This is a major public health concern because it has been found that people living with HIV and a comorbid substance use disorder (SUD) have lower retention to care and decreased medication adherence. Several previous studies have demonstrated that the incidence of opportunistic infections, a marker of disease progression, is higher in this population. There have been no recent studies that evaluate the relationship between a SUD and the presence of an opportunistic infection (i.e. AIDS defining illness) in PLWH. Furthermore, there has yet to be a study which evaluates this syndemic nationally.

# Research Questions

1. In people living with HIV (PLWH) in 2018, how does hospitalization due to an AIDS defining illnesses in those with a SUD compare to those without a SUD?

2. PLWH in 2018, how does the length of hospital stay for people with SUD compare to those without SUD?

My Data

Healthcare Cost and Utilization Project, Nationwide Inpatient Sample (HCUP-NIS) is the largest available all-payer inpatient healthcare administrative data set. It approximates a 20-percent stratified sample of all discharges from United States hospitals. It constitutes data from 48 states and 10,000 community hospitals, representing 95% of the United States population. Data from each record contains information regarding patient demographics, diagnoses, procedures, and other information associated with a hospital admission.

The data can be purchased by the public with at the following link:

**Strengths of how this data set relates to my research question**:

- Nationally representative

- Inpatient hospitalizations

- Collects information on sociodemographic factors which I can adjust for

- Collects information on up to 40 diagnoses, thus I can capture both the exposure and outcome

- Collects information on length of stay

**Limitations of the data set**:

- the data quality of secondary databases is not perfect as the diagnoses codes may not necessarily be accurate, granular, or complete

  • The HIV population tends to have many co-occurring conditions, and thus it is possible that not all SUD conditions were not recorded. Therefore, there may be some people in the unexposed group who should be in the exposed group

- The latest data available is from 2018. Drug therapy has dramatically changed with integrase inhibitors becoming first line and having drugs with longer half-lives and easier to adhere too. This is especially important for the SUD population who are less likely to be adherent. Thus, with with all of the advances in care, the 2018 analysis may not actually reflect 2021’s gaps in care.

Data Ingest

Below I am ingesting my data `hiv_raw`.

hiv_raw <- read.csv("hiv_oi.csv") %>% 

hiv_raw <- hiv_raw %>% 
  haven::zap_label()  %>% 
  mutate(key_nis = as.character(key_nis))

[1] 24203    20

Tidying, Data Cleaning and Data Management

Below I am cleaning the data according to the HCUP_NIS code book found at

In summary I have:

1.) converted all variables to factors except for `age` and `key_nis`

2.) coded the variable levels for factors with more descriptive names rather than numbers

3.) reordered according to frequency with `fct_infreq`

4.) selected only the variables that I will use

5.) converted `los` to a number

hiv <- hiv_raw %>% 
mutate(female=as.numeric(female)) %>% 
  mutate(sex = fct_recode(factor(female),
"male" = "0", "female" = "1"),
        race = fct_recode(factor(race),
                "White" = "1", 
                "Black" = "2", 
                "Hispanic"= "3",
                "Asian" = "4",
                "NativeA"= "5",
                "Other" = "6"),
    race = fct_infreq(race), 
    zipinc_qrtl = fct_recode(factor(zipinc_qrtl),
                            "<48K"= "1",
                            "48-61K" = "2",
                            "61-82K"= "3",
                            "82K+" = "4")) %>% 
mutate(pay1=as.numeric(pay1)) %>% 
  mutate(insurance  = fct_recode(factor(pay1),
                  "Medicare" = "1",
                  "Medicaid" = "2",
                  "Private" = "3",
                  "Self_pay" = "4",
                  "Other" = "5",
                  "Other" = "6"),
insurance = fct_infreq(insurance), 
    patient_loc =  fct_recode(factor(pl_nchs),
                  "Central" = "1",
                  "Fringe" = "2",
                  "metro>250K" = "3",
                  "metro>50K" = "4",
                  "micro" = "5",
                  "Other" = "6" ),
patient_loc = fct_infreq(patient_loc)) %>% 
  mutate(region = fct_recode(factor(hosp_division),
                  "Northeast" = "1",
                  "Northeast" = "2",
                  "Midwest" = "3",
                  "Midwest" = "4",
                  "South_Atlantic" = "5",
                  "South" = "6",
                  "South" = "7",
                  "West" = "8",
                  "West" = "9"),
        region = fct_infreq(region)) %>% 
  mutate(ED_record = fct_recode(factor(hcup_ed),
          "no" = "0",
          "yes" = "1", "yes" = "2", "yes" ="3", "yes"="4")) %>% 
         "yes" = "1",
         "no"= "0"),
         subst_abuse = fct_relevel(subst_abuse, "no")) %>% 
  mutate(los=as.numeric(los)) %>% 
 mutate(AIDS_f = fct_recode(factor(oi),
                           "yes"= "1",
                           "no" ="0")) %>% 
  mutate(AIDS_f=fct_relevel(AIDS_f, "no")) %>% 
  rename(AIDS= oi) %>% 
  select(key_nis, subst_abuse, AIDS, AIDS_f, los, age, sex, race, region, zipinc_qrtl, insurance, patient_loc, ED_record)