Examples

Data Analysis Examples

Contact me if you are interested in seeing examples of Stata / SAS code I’ve written.

(Note: For all of the examples below, research questions were provided by the respective researcher; I was responsible for pulling the dataset and running the analyses; In most cases, the results were published in the form of a manuscript or abstract or presented at a conference as a poster. See Publications). 


Abbreviations:

HCUP-NIS, Healthcare Cost Utilization Project-National Inpatient Sample

HCUP-KID, Healthcare Cost Utilization Project-Kids’ Inpatient Database

NSQIP-PUF, National Surgery Quality Improvement Program-Participant Use File

PHC4, Pennsylvania Health Care Cost Containment Council


Analysis of patient outcomes and costs associated with nephrectomy based on surgical approach

  • Source of Data: HCUP-NIS
  • Stratification: Surgical approach (open, laparoscopic, or robotic-assisted laparoscopic) and race/ethnicity
  • Outcomes of interest: Hospital LOS, mortality, total costs
  • Statistical Analysis: ANOVA, chi2, generalized linear models, logistic regression
  • Software Used: Sas, Stata


Analysis of outcomes among patients who underwent coronary artery bypass graft (CABG) surgery based on use of cardiopulmonary bypass (CPB) support 

  • Source of Data: HCUP-NIS
  • Stratification: Use of cardiopulmonary bypass (CPB) support during surgery (used [“on-pump”] or not used [“off-pump”])
  • Outcomes of interest: Hospital LOS, mortality, total costs, postoperative complications (stroke, surgical site infection, or sepsis), annual trends in use of on-pump and off-pump approaches 
  • Statistical Analysis: t-test, chi2, linear regression, logistic regression
  • Software Used: Sas, Stata


Analysis of costs and outcomes associated with peritonitis among patients who received non-routine, inpatient dialysis  

  • Source of Data: HCUP-NIS
  • Stratification: Geographical location in the US (Midwest, Northeast, South, or West)
  • Outcomes of interest: Hospital LOS, mortality, total costs
  • Statistical Analysis: t-test, chi2, logistic regression, generalized linear models
  • Software Used: Sas, Stata


Analysis of patient outcomes following cerebellar stroke 

  • Source of Data: HCUP-NIS
  • Stratification: In-hospital mortality
  • Outcomes of interest: Hospital LOS, in-hospital mortality, total costs
  • Statistical Analysis: t-test, chi2, logistic regression, linear regression 
  • Software Used: Sas, Stata


Analysis of patients of Hispanic ethnicity hospitalized for any reason who left the hospital against medical advice 

  • Source of Data: HCUP-NIS
  • Stratification: Whether the patient left the hospital against medical advice 
  • Outcomes of interest: Hospital LOS, left against medical advice 
  • Statistical Analysis: t-test, chi2, logistic regression 
  • Software Used: Sas, Stata


Analysis of patient outcomes and costs among children who underwent tonsillectomy

  • Source of Data: HCUP-KID
  • Stratification: Presence or absence of autism spectrum disorder
  • Outcomes of interest: Post-surgical complications, hospital LOS, and total costs
  • Statistical Analysis: t-test, chi2, linear regression, logistic regression, propensity score matching
  • Software Used: Sas, Stata


Analysis of hospital length of stay and costs among children who underwent surgery for treatment of cleft lip / cleft palate at children’s versus non-children’s hospitals 

  • Source of Data: HCUP-KID
  • Stratification: Location of surgery (children’s hospital vs non-children’s hospital)
  • Outcomes of interest: Hospital LOS, total costs
  • Statistical Analysis: t-test, chi2, linear regression
  • Software Used: Sas, Stata


Analysis of patient outcomes and costs among children who underwent surgery for craniosynostosis at children’s versus non-children’s hospitals

  • Source of Data: HCUP-KID
  • Stratification: Location of surgery (children’s hospital vs non-children’s hospital)
  • Outcomes of interest: Hospital LOS, mortality, total costs
  • Statistical Analysis: t-test, chi2, linear regression
  • Software Used: Sas, Stata


Analysis of patient outcomes and costs associated with Chiari I malformation surgery at children’s versus non-children’s hospitals

  • Source of Data: HCUP-KID
  • Stratification: Location of surgery (children’s hospital vs non-children’s hospital)
  • Outcomes of interest: Hospital LOS, mortality, total costs
  • Statistical Analysis: t-test, chi2, linear regression, propensity score matching
  • Software Used: Sas, Stata


Analysis of geographical differences in mechanism of injury related to pediatric gun shot wounds

  • Source of Data: HCUP-KID
  • Stratification: Hospital location (urban versus rural), minority status, payer type (as a surrogate for socioeconomic status)
  • Outcomes of interest: Mechanism of injury (accidental or violence-related)
  • Statistical Analysis: t-test, chi2, logistic regression
  • Software Used: Sas, Stata


Analysis of patient outcomes among those who underwent surgical repair for branchial cleft cyst

  • Source of Data: NSQIP-PUF
  • Stratification: Age of patient (pediatric versus adult)
  • Outcomes of interest: Post-surgical complications, return to operating room, cause of readmission, time to readmission, operative time, hospital LOS, time to hospital discharge, mortality
  • Statistical Analysis: t-test, chi2, linear regression, logistic regression
  • Software Used: Stata


Analysis of patient outcomes among those who underwent surgery for bladder tumor

  • Source of Data: NSQIP-PUF
  • Stratification: bladder tumor size (CPT code: small, medium, large)
  • Outcomes of interest: Post-surgical complications, hospital LOS, re-operation rate, 30day admission rate, mortality
  • Statistical Analysis: ANOVA, chi2, linear regression, logistic regression
  • Software Used: Sas, Stata


Analysis of patient outcomes among those who underwent emergency general surgery 

  • Source of Data: NSQIP-PUF
  • Stratification: Faculty type (dedicated service line faculty vs nondedicated faculty), based on year (time periods where there were 100% dedicated faculty vs time periods with nondedicated faculty)
  • Outcomes of interest: Post-surgical complications, hospital LOS, re-operation rate, 30day admission rate, mortality
  • Statistical Analysis: t-test, chi2, linear regression, logistic regression
  • Software Used: Sas, Stata


Association of preoperative patient risk factors with patient outcomes among those who underwent inpatient parathyroidectomy for primary hyperparathyroidism

  • Source of Data: NSQIP-PUF
  • Stratification: Anesthesia type received (general anesthesia or monitored anesthesia care using local anesthesia)
  • Outcomes of interest: Operative time, post-surgical complications, hospital LOS, and mortality
  • Statistical Analysis: t-test, chi2, linear regression, logistic regression
  • Software Used: Sas, Stata


Analysis of patient outcomes among those who underwent adrenalectomy based on physician specialty

  • Source of Data: NSQIP-PUF
  • Stratification: Physician specialty (general surgeon versus urologist)
  • Outcomes of interest: Post-surgical complications, hospital LOS, re-operation rate, 30day admission rate, mortality
  • Statistical Analysis: t-test, chi2, linear regression, logistic regression
  • Software Used: Sas, Stata


Analysis of patient outcomes among those who underwent nephroureterectomy based on surgical approach 

  • Source of Data: NSQIP-PUF
  • Stratification: Surgical approach (open versus laparoscopic)
  • Outcomes of interest: Post-surgical complications, hospital LOS, 30day admission rate, mortality
  • Statistical Analysis: t-test, chi2, linear regression, logistic regression, propensity score matching
  • Software Used: Sas, Stata


Analysis of 30-day readmission rates among those who underwent hysterectomy in Pennsylvania between 2011 and 2014  

  • Source of Data: PHC4
  • Stratification: 30-day readmission
  • Outcome of interest: surgical route (open or minimally invasive), hospital LOS
  • Statistical Analysis: t-test, chi2, logistic regression, generalized linear model
  • Software Used: Stata


Analysis of 30-day readmission rates among those who underwent colectomy in Pennsylvania between 2011 and 2014 based on discharge to a skilled nursing facility (SNF)

  • Source of Data: PHC4
  • Stratification: Discharge location (SNF versus other location)
  • Outcomes of interest: Readmission at 30-days, mean days to readmission 
  • Statistical Analysis: t-test, chi2, logistic regression, propensity score matching analysis
  • Software Used: Sas, Stata


Analysis of 30-day readmission rates among patients who were admitted for AMI, PNA, CHF, COPD, Stroke, or CABG based on time period and hospital type

  • Source of Data: PHC4
  • Stratification: Time period relative to implementation date of CMS’s Hospital Readmission Reduction Program (HRRP) (pre-HRRP, post-implementation year 1, 2, and 3); type of hospital (critical access hospital vs noncritical access hospital); age of patient (Medicare-eligible or non-Medicare eligible)
  • Outcomes of interest: Readmission at 30 days 
  • Statistical Analysis: t-test, chi2, logistic regression
  • Software Used: Sas, Stata


Costs and outcomes associated with liver transplant at a tertiary teaching hospital

  • Source of Data: Medical charts / hospital accounting records
  • Stratification: Model for EndStage Liver Disease (MELD) Score at time of transplantation (grouped into 3 cohorts)
  • Outcomes of interest: Fully-loaded operating costs (fixed plus variable costs), hospital LOS for transplant hospitalization, discharge destination, 1-year survival 
  • Statistical Analysis: t-test, chi2, linear regression, logistic regression, KaplanMeier survival analysis, breakeven LOS
  • Software Used: Stata


Patient outcomes among those hospitalized for a blunt trauma event according to oral anticoagulant being taken at time of event

  • Source of Data: Medical charts
  • Stratification: Type of oral anticoagulant being taken at time of event (none, warfarin, or novel oral anticoagulants [NOACs = apixaban, dabigatran, or rivaroxaban])
  • Outcomes of interest: Hospital LOS, ICU LOS, requiring an ICU stay, requiring a ventilator, number of ventilator days, mortality 
  • Statistical Analysis: t-test, chi2, linear regression, logistic regression, CoxProportional hazards 
  • Software Used: Stata


Analysis of total medical spending for patients grouped by presence or absence of at least 1 claim for anti-diabetic prescription medications

  • Source of Data: Administrative claims
  • Stratification: >=1 claim for selected anti-diabetic prescriptions  
  • Outcomes of interest: Total annual medical spending
  • Statistical Analysis: Descriptive statistics (mean, median, min, max)
  • Software Used: Sas, Stata
  • Other Notes: NDC Codes were of varying lengths (7, 8, 9, 10, or 11; required conversion to 11digit format; then conversion into name of drug)


Analysis of sleep quality survey results from a cohort of patients with Ehler’s Danlos Syndrome (EDS)

  • Source of Data: Patient survey results
  • Stratification: EDS subtype (classic, hyper mobile, vascular, other)
  • Outcomes of interest: Pittsburgh Sleep Quality Index (PSQI) total score and sub-scores, Epworth Sleepiness Scale (ESS) total score and sub-scores, Multidimensional Fatigue Inventory20 (MFI20) total score and sub-scores, Sleep Medicine Associates of Maryland (SMAM) Inventory selected items
  • Statistical Analysis: ANOVA, chi2
  • Software Used: Stata 


Analysis of postoperative outcomes among pediatric patients who underwent tonsillectomy for obstructive sleep apnea 

  • Source of Data: Medical charts
  • Stratification: Timing of morphine receipt (intra-operatively, postoperatively, or both)
  • Outcomes of interest: Hospital LOS, length of time in recovery room [PACU] (hours), postoperative bleeding, readmission at 2 weeks) 
  • Statistical Analysis: ANOVA, chi2
  • Software Used: Stata 


Analysis of postoperative outcomes among patients who underwent laparoscopic adjustable gastric banding (LAGB) for weight loss

  • Source of Data: Medical charts
  • Stratification: Requirement of revision or removal
  • Outcomes of interest: Re-operation due to band complication, band removal, conversion to alternate bariatric procedure, times from LAGB to first re-operation or any conversion surgery, mean weight loss at times of re-operation / removal, and percentage of conversion surgery
  • Statistical Analysis: t-test, chi2, KaplanMeier survival analysis
  • Software Used: Stata 


Costs of care at 1 year for patients who underwent complex ventral hernia repair (cVHR)

  • Source of Data: Medical charts / hospital accounting records
  • Stratification: Presence or absence of 1 or more complications
  • Outcomes of interest: Cumulative costs of care at 1 year 
  • Statistical Analysis: t-test, chi2, linear regression
  • Software Used: Stata 


Analysis of pain burden for patients who underwent complex ventral hernia repair (cVHR)

  • Source of Data: Medical charts
  • Stratification: n/a
  • Outcomes of interest: Presence or absence of ongoing subjective pain complaints 
  • Statistical Analysis: logistic regression
  • Software Used: Stata 


Hospital-level analysis of 30-day readmission rates and reimbursement penalties following total joint arthroplasty 

  • Source of Data: Medical charts / hospital accounting records
  • Stratification: Receipt of a Healthcare Acquired Condition Reduction Program (HACRP) penalty in FY2018
  • Outcomes of interest: Receipt of a penalty in FY2018, penalty amounts, associated revenue loss amounts, 30day readmission rates following total joint arthroplasty
  • Statistical Analysis: t-test, chi2, logistic regression, linear regression
  • Software Used: Stata


Analysis of claims paid based on time elapsed since patient outreach 

  • Source of Data: Administrative claims
  • Stratification: Comorbidities, days since patient activation / outreach 
  • Outcomes of interest: Cumulative costs
  • Statistical Analysis: t-test, chi2, linear regression, logistic regression, propensity score matching
  • Software Used: Sas, Stata, R (graphs)


What is the most cost-effective strategy for managing a thyroid nodule with indeterminate cytology: repeat fine needle aspiration (FNA), diagnostic lobectomy, or Thyroseq molecular testing?

  • Source of Data: Literature (disease prevalence, sensitivities and specificities of the tests, and total costs)
  • Outcome of interest: Incremental Cost Effectiveness Ratios (ICER)
  • Software Used: TreeAge 


Analysis of costs associated with different sequences of tests for diagnosing the location of a confirmed but unlocalized CSF leak (CSF rhinorrhea)

  • Source of Data: Literature (disease prevalence, sensitivities and specificities of the tests, and total costs)
  • Outcome of interest: Costs (payer perspective)
  • Software Used: TreeAge
  • Literature search was performed to parameterize the tree 


Costs of adverse events associated with emergent intubations as a function of the number of intubation attempts required for successful placement of an endotracheal (ET) tube 

  • Source of Data: Expert Opinion; Literature (first-, second-, and third-pass success rates; complication rates; costs associated with each complication)
  • Outcome of interest: Costs (provider perspective)
  • Software Used: TreeAge


Click here for examples in other service areas