Data Analysis Examples
Contact me if you are interested in seeing examples of Stata / SAS code I’ve written.
(Note: For all of the examples below, research questions were provided by the respective researcher; I was responsible for pulling the dataset and running the analyses; In most cases, the results were published in the form of a manuscript or abstract or presented at a conference as a poster. See Publications).
Abbreviations:
HCUP-NIS, Healthcare Cost Utilization Project-National Inpatient Sample
HCUP-KID, Healthcare Cost Utilization Project-Kids’ Inpatient Database
NSQIP-PUF, National Surgery Quality Improvement Program-Participant Use File
PHC4, Pennsylvania Health Care Cost Containment Council
Analysis of patient outcomes and costs associated with nephrectomy based on surgical approach
- Source of Data: HCUP-NIS
- Stratification: Surgical approach (open, laparoscopic, or robotic-assisted laparoscopic) and race/ethnicity
- Outcomes of interest: Hospital LOS, mortality, total costs
- Statistical Analysis: ANOVA, chi2, generalized linear models, logistic regression
- Software Used: Sas, Stata
Analysis of outcomes among patients who underwent coronary artery bypass graft (CABG) surgery based on use of cardiopulmonary bypass (CPB) support
- Source of Data: HCUP-NIS
- Stratification: Use of cardiopulmonary bypass (CPB) support during surgery (used [“on-pump”] or not used [“off-pump”])
- Outcomes of interest: Hospital LOS, mortality, total costs, postoperative complications (stroke, surgical site infection, or sepsis), annual trends in use of on-pump and off-pump approaches
- Statistical Analysis: t-test, chi2, linear regression, logistic regression
- Software Used: Sas, Stata
Analysis of costs and outcomes associated with peritonitis among patients who received non-routine, inpatient dialysis
- Source of Data: HCUP-NIS
- Stratification: Geographical location in the US (Midwest, Northeast, South, or West)
- Outcomes of interest: Hospital LOS, mortality, total costs
- Statistical Analysis: t-test, chi2, logistic regression, generalized linear models
- Software Used: Sas, Stata
Analysis of patient outcomes following cerebellar stroke
- Source of Data: HCUP-NIS
- Stratification: In-hospital mortality
- Outcomes of interest: Hospital LOS, in-hospital mortality, total costs
- Statistical Analysis: t-test, chi2, logistic regression, linear regression
- Software Used: Sas, Stata
Analysis of patients of Hispanic ethnicity hospitalized for any reason who left the hospital against medical advice
- Source of Data: HCUP-NIS
- Stratification: Whether the patient left the hospital against medical advice
- Outcomes of interest: Hospital LOS, left against medical advice
- Statistical Analysis: t-test, chi2, logistic regression
- Software Used: Sas, Stata
Analysis of patient outcomes and costs among children who underwent tonsillectomy
- Source of Data: HCUP-KID
- Stratification: Presence or absence of autism spectrum disorder
- Outcomes of interest: Post-surgical complications, hospital LOS, and total costs
- Statistical Analysis: t-test, chi2, linear regression, logistic regression, propensity score matching
- Software Used: Sas, Stata
Analysis of hospital length of stay and costs among children who underwent surgery for treatment of cleft lip / cleft palate at children’s versus non-children’s hospitals
- Source of Data: HCUP-KID
- Stratification: Location of surgery (children’s hospital vs non-children’s hospital)
- Outcomes of interest: Hospital LOS, total costs
- Statistical Analysis: t-test, chi2, linear regression
- Software Used: Sas, Stata
Analysis of patient outcomes and costs among children who underwent surgery for craniosynostosis at children’s versus non-children’s hospitals
- Source of Data: HCUP-KID
- Stratification: Location of surgery (children’s hospital vs non-children’s hospital)
- Outcomes of interest: Hospital LOS, mortality, total costs
- Statistical Analysis: t-test, chi2, linear regression
- Software Used: Sas, Stata
Analysis of patient outcomes and costs associated with Chiari I malformation surgery at children’s versus non-children’s hospitals
- Source of Data: HCUP-KID
- Stratification: Location of surgery (children’s hospital vs non-children’s hospital)
- Outcomes of interest: Hospital LOS, mortality, total costs
- Statistical Analysis: t-test, chi2, linear regression, propensity score matching
- Software Used: Sas, Stata
Analysis of geographical differences in mechanism of injury related to pediatric gun shot wounds
- Source of Data: HCUP-KID
- Stratification: Hospital location (urban versus rural), minority status, payer type (as a surrogate for socioeconomic status)
- Outcomes of interest: Mechanism of injury (accidental or violence-related)
- Statistical Analysis: t-test, chi2, logistic regression
- Software Used: Sas, Stata
Analysis of patient outcomes among those who underwent surgical repair for branchial cleft cyst
- Source of Data: NSQIP-PUF
- Stratification: Age of patient (pediatric versus adult)
- Outcomes of interest: Post-surgical complications, return to operating room, cause of readmission, time to readmission, operative time, hospital LOS, time to hospital discharge, mortality
- Statistical Analysis: t-test, chi2, linear regression, logistic regression
- Software Used: Stata
Analysis of patient outcomes among those who underwent surgery for bladder tumor
- Source of Data: NSQIP-PUF
- Stratification: bladder tumor size (CPT code: small, medium, large)
- Outcomes of interest: Post-surgical complications, hospital LOS, re-operation rate, 30day admission rate, mortality
- Statistical Analysis: ANOVA, chi2, linear regression, logistic regression
- Software Used: Sas, Stata
Analysis of patient outcomes among those who underwent emergency general surgery
- Source of Data: NSQIP-PUF
- Stratification: Faculty type (dedicated service line faculty vs nondedicated faculty), based on year (time periods where there were 100% dedicated faculty vs time periods with nondedicated faculty)
- Outcomes of interest: Post-surgical complications, hospital LOS, re-operation rate, 30day admission rate, mortality
- Statistical Analysis: t-test, chi2, linear regression, logistic regression
- Software Used: Sas, Stata
Association of preoperative patient risk factors with patient outcomes among those who underwent inpatient parathyroidectomy for primary hyperparathyroidism
- Source of Data: NSQIP-PUF
- Stratification: Anesthesia type received (general anesthesia or monitored anesthesia care using local anesthesia)
- Outcomes of interest: Operative time, post-surgical complications, hospital LOS, and mortality
- Statistical Analysis: t-test, chi2, linear regression, logistic regression
- Software Used: Sas, Stata
Analysis of patient outcomes among those who underwent adrenalectomy based on physician specialty
- Source of Data: NSQIP-PUF
- Stratification: Physician specialty (general surgeon versus urologist)
- Outcomes of interest: Post-surgical complications, hospital LOS, re-operation rate, 30day admission rate, mortality
- Statistical Analysis: t-test, chi2, linear regression, logistic regression
- Software Used: Sas, Stata
Analysis of patient outcomes among those who underwent nephroureterectomy based on surgical approach
- Source of Data: NSQIP-PUF
- Stratification: Surgical approach (open versus laparoscopic)
- Outcomes of interest: Post-surgical complications, hospital LOS, 30day admission rate, mortality
- Statistical Analysis: t-test, chi2, linear regression, logistic regression, propensity score matching
- Software Used: Sas, Stata
Analysis of 30-day readmission rates among those who underwent hysterectomy in Pennsylvania between 2011 and 2014
- Source of Data: PHC4
- Stratification: 30-day readmission
- Outcome of interest: surgical route (open or minimally invasive), hospital LOS
- Statistical Analysis: t-test, chi2, logistic regression, generalized linear model
- Software Used: Stata
Analysis of 30-day readmission rates among those who underwent colectomy in Pennsylvania between 2011 and 2014 based on discharge to a skilled nursing facility (SNF)
- Source of Data: PHC4
- Stratification: Discharge location (SNF versus other location)
- Outcomes of interest: Readmission at 30-days, mean days to readmission
- Statistical Analysis: t-test, chi2, logistic regression, propensity score matching analysis
- Software Used: Sas, Stata
Analysis of 30-day readmission rates among patients who were admitted for AMI, PNA, CHF, COPD, Stroke, or CABG based on time period and hospital type
- Source of Data: PHC4
- Stratification: Time period relative to implementation date of CMS’s Hospital Readmission Reduction Program (HRRP) (pre-HRRP, post-implementation year 1, 2, and 3); type of hospital (critical access hospital vs noncritical access hospital); age of patient (Medicare-eligible or non-Medicare eligible)
- Outcomes of interest: Readmission at 30 days
- Statistical Analysis: t-test, chi2, logistic regression
- Software Used: Sas, Stata
Costs and outcomes associated with liver transplant at a tertiary teaching hospital
- Source of Data: Medical charts / hospital accounting records
- Stratification: Model for EndStage Liver Disease (MELD) Score at time of transplantation (grouped into 3 cohorts)
- Outcomes of interest: Fully-loaded operating costs (fixed plus variable costs), hospital LOS for transplant hospitalization, discharge destination, 1-year survival
- Statistical Analysis: t-test, chi2, linear regression, logistic regression, KaplanMeier survival analysis, breakeven LOS
- Software Used: Stata
Patient outcomes among those hospitalized for a blunt trauma event according to oral anticoagulant being taken at time of event
- Source of Data: Medical charts
- Stratification: Type of oral anticoagulant being taken at time of event (none, warfarin, or novel oral anticoagulants [NOACs = apixaban, dabigatran, or rivaroxaban])
- Outcomes of interest: Hospital LOS, ICU LOS, requiring an ICU stay, requiring a ventilator, number of ventilator days, mortality
- Statistical Analysis: t-test, chi2, linear regression, logistic regression, CoxProportional hazards
- Software Used: Stata
Analysis of total medical spending for patients grouped by presence or absence of at least 1 claim for anti-diabetic prescription medications
- Source of Data: Administrative claims
- Stratification: >=1 claim for selected anti-diabetic prescriptions
- Outcomes of interest: Total annual medical spending
- Statistical Analysis: Descriptive statistics (mean, median, min, max)
- Software Used: Sas, Stata
- Other Notes: NDC Codes were of varying lengths (7, 8, 9, 10, or 11; required conversion to 11digit format; then conversion into name of drug)
Analysis of sleep quality survey results from a cohort of patients with Ehler’s Danlos Syndrome (EDS)
- Source of Data: Patient survey results
- Stratification: EDS subtype (classic, hyper mobile, vascular, other)
- Outcomes of interest: Pittsburgh Sleep Quality Index (PSQI) total score and sub-scores, Epworth Sleepiness Scale (ESS) total score and sub-scores, Multidimensional Fatigue Inventory20 (MFI20) total score and sub-scores, Sleep Medicine Associates of Maryland (SMAM) Inventory selected items
- Statistical Analysis: ANOVA, chi2
- Software Used: Stata
Analysis of postoperative outcomes among pediatric patients who underwent tonsillectomy for obstructive sleep apnea
- Source of Data: Medical charts
- Stratification: Timing of morphine receipt (intra-operatively, postoperatively, or both)
- Outcomes of interest: Hospital LOS, length of time in recovery room [PACU] (hours), postoperative bleeding, readmission at 2 weeks)
- Statistical Analysis: ANOVA, chi2
- Software Used: Stata
Analysis of postoperative outcomes among patients who underwent laparoscopic adjustable gastric banding (LAGB) for weight loss
- Source of Data: Medical charts
- Stratification: Requirement of revision or removal
- Outcomes of interest: Re-operation due to band complication, band removal, conversion to alternate bariatric procedure, times from LAGB to first re-operation or any conversion surgery, mean weight loss at times of re-operation / removal, and percentage of conversion surgery
- Statistical Analysis: t-test, chi2, KaplanMeier survival analysis
- Software Used: Stata
Costs of care at 1 year for patients who underwent complex ventral hernia repair (cVHR)
- Source of Data: Medical charts / hospital accounting records
- Stratification: Presence or absence of 1 or more complications
- Outcomes of interest: Cumulative costs of care at 1 year
- Statistical Analysis: t-test, chi2, linear regression
- Software Used: Stata
Analysis of pain burden for patients who underwent complex ventral hernia repair (cVHR)
- Source of Data: Medical charts
- Stratification: n/a
- Outcomes of interest: Presence or absence of ongoing subjective pain complaints
- Statistical Analysis: logistic regression
- Software Used: Stata
Hospital-level analysis of 30-day readmission rates and reimbursement penalties following total joint arthroplasty
- Source of Data: Medical charts / hospital accounting records
- Stratification: Receipt of a Healthcare Acquired Condition Reduction Program (HACRP) penalty in FY2018
- Outcomes of interest: Receipt of a penalty in FY2018, penalty amounts, associated revenue loss amounts, 30day readmission rates following total joint arthroplasty
- Statistical Analysis: t-test, chi2, logistic regression, linear regression
- Software Used: Stata
Analysis of claims paid based on time elapsed since patient outreach
- Source of Data: Administrative claims
- Stratification: Comorbidities, days since patient activation / outreach
- Outcomes of interest: Cumulative costs
- Statistical Analysis: t-test, chi2, linear regression, logistic regression, propensity score matching
- Software Used: Sas, Stata, R (graphs)
What is the most cost-effective strategy for managing a thyroid nodule with indeterminate cytology: repeat fine needle aspiration (FNA), diagnostic lobectomy, or Thyroseq molecular testing?
- Source of Data: Literature (disease prevalence, sensitivities and specificities of the tests, and total costs)
- Outcome of interest: Incremental Cost Effectiveness Ratios (ICER)
- Software Used: TreeAge
Analysis of costs associated with different sequences of tests for diagnosing the location of a confirmed but unlocalized CSF leak (CSF rhinorrhea)
- Source of Data: Literature (disease prevalence, sensitivities and specificities of the tests, and total costs)
- Outcome of interest: Costs (payer perspective)
- Software Used: TreeAge
- Literature search was performed to parameterize the tree
Costs of adverse events associated with emergent intubations as a function of the number of intubation attempts required for successful placement of an endotracheal (ET) tube
- Source of Data: Expert Opinion; Literature (first-, second-, and third-pass success rates; complication rates; costs associated with each complication)
- Outcome of interest: Costs (provider perspective)
- Software Used: TreeAge
Click here for examples in other service areas