# Load required libraries
library(tidyverse)
library(data.table)
library(scales)
library(knitr)
library(lubridate)
# Set theme for plots
theme_set(theme_minimal(base_size = 12))The Hidden Cost of Social Determinants of Health
An Analysis of Healthcare Utilization and Costs Associated with Social Risk Factors
This report was generated using synthetic data! It is only used for demonstration purposes and does not reflect real patients or healthcare providers. The contents have not been reviewed or validated by medical professionals.
Executive Summary
This report examines the relationship between social determinants of health (SDOH) and healthcare costs using a comprehensive dataset of patients. Our analysis reveals significant associations between social risk factors—such as stress, social isolation, unemployment, and exposure to violence—and both healthcare utilization patterns and total costs.
Key Findings:
- Patients with documented SDOH conditions incur significantly higher healthcare costs compared to those without such conditions
- Social isolation and stress are among the most prevalent SDOH factors, affecting thousands of patients
- A clear correlation exists between SDOH conditions and the development of chronic diseases
- The compounding effect of multiple SDOH factors leads to exponentially higher healthcare costs
- Insurance coverage patterns vary significantly between patients with and without SDOH conditions
Introduction
Why SDOH Matter for Healthcare Costs
Research has consistently shown that SDOH have a profound impact on health outcomes and healthcare costs. Individuals facing adverse social circumstances are more likely to:
- Delay seeking preventive care
- Experience higher rates of chronic disease
- Have more emergency department visits
- Face barriers to medication adherence
- Experience worse health outcomes overall
Research Questions
This analysis addresses the following questions:
- How do social factors (stress, isolation, unemployment, violence exposure) correlate with healthcare utilization and costs?
- Are patients with SDOH conditions more likely to develop chronic conditions?
- What is the financial burden difference between patients with and without SDOH conditions?
- Do multiple SDOH factors have a compounding effect on healthcare costs?
Data Overview
Show code
# Load all datasets from data-fixed directory
csv_files <- list.files("../data-fixed", pattern = "\\.csv$", full.names = TRUE)
csv_names <- tools::file_path_sans_ext(basename(csv_files))
for (i in seq_along(csv_files)) {
assign(csv_names[i], data.table::fread(csv_files[i]), envir = .GlobalEnv)
}
# Verify key datasets are loaded
if (!exists("patients") || !exists("encounters") || !exists("conditions")) {
stop("Failed to load required datasets (patients, encounters, conditions)")
}The dataset contains comprehensive healthcare records for 6,851 patients, including:
- 958,789 healthcare encounters
- 569,790 diagnosed conditions
- 2,872,415 medical procedures
- 1,504,482 insurance claims
SDOH Conditions in the Dataset
Show code
# Define social determinant conditions
sdoh_conditions <- c(
"Stress (finding)",
"Social isolation (finding)",
"Limited social contact (finding)",
"Victim of intimate partner abuse (finding)",
"Reports of violence in the environment (finding)",
"Unemployed (finding)",
"Not in labor force (finding)"
)
# Create categories for SDOH
sdoh_categories <- tribble(
~condition , ~category ,
"Stress (finding)" , "Mental Health & Stress" ,
"Social isolation (finding)" , "Social Isolation" ,
"Limited social contact (finding)" , "Social Isolation" ,
"Victim of intimate partner abuse (finding)" , "Violence & Abuse" ,
"Reports of violence in the environment (finding)" , "Violence & Abuse" ,
"Unemployed (finding)" , "Economic Instability" ,
"Not in labor force (finding)" , "Economic Instability"
)Show code
# Count occurrences of each SDOH condition
sdoh_counts <- conditions |>
filter(description %in% sdoh_conditions) |>
count(description, sort = TRUE) |>
left_join(sdoh_categories, by = c("description" = "condition")) |>
select(Category = category, Condition = description, Count = n)
kable(
sdoh_counts,
format.args = list(big.mark = ","),
caption = "Prevalence of Social Determinant Conditions"
)| Category | Condition | Count |
|---|---|---|
| Mental Health & Stress | Stress (finding) | 36,363 |
| Social Isolation | Social isolation (finding) | 12,661 |
| Social Isolation | Limited social contact (finding) | 12,453 |
| Economic Instability | Not in labor force (finding) | 12,122 |
| Violence & Abuse | Victim of intimate partner abuse (finding) | 7,851 |
| Violence & Abuse | Reports of violence in the environment (finding) | 7,115 |
| Economic Instability | Unemployed (finding) | 6,473 |
Analysis
4.2 Healthcare Utilization Patterns
Show code
# Join encounters with patient SDOH status
encounters_analysis <- encounters |>
left_join(patients_analysis |> select(id, has_sdoh), by = c("patient" = "id"))
# Calculate encounter statistics
encounter_stats <- encounters_analysis |>
group_by(has_sdoh) |>
summarise(
total_encounters = n(),
patients = n_distinct(patient),
encounters_per_patient = n() / n_distinct(patient),
avg_encounter_cost = mean(base_encounter_cost, na.rm = TRUE),
avg_total_claim = mean(total_claim_cost, na.rm = TRUE)
)
kable(
encounter_stats,
digits = 2,
col.names = c(
"Has SDOH",
"Total Encounters",
"Patients",
"Encounters per Patient",
"Avg Encounter Cost ($)",
"Avg Total Claim ($)"
),
format.args = list(big.mark = ","),
caption = "Healthcare Utilization by SDOH Status"
)| Has SDOH | Total Encounters | Patients | Encounters per Patient | Avg Encounter Cost (\()| Avg Total Claim (\)) | |
|---|---|---|---|---|---|
| FALSE | 49,530 | 1,508 | 32.84 | 106.94 | 1,947.25 |
| TRUE | 909,259 | 5,343 | 170.18 | 102.80 | 3,839.81 |
Show code
# Encounter types by SDOH status
encounter_type_summary <- encounters_analysis |>
group_by(has_sdoh, encounterclass) |>
summarise(n = n(), .groups = "drop") |>
group_by(has_sdoh) |>
mutate(pct = n / sum(n) * 100) |>
filter(
encounterclass %in%
c("wellness", "ambulatory", "emergency", "outpatient", "urgentcare")
)
ggplot(
encounter_type_summary,
aes(x = encounterclass, y = pct, fill = has_sdoh)
) +
geom_col(position = "dodge") +
labs(
title = "Healthcare Encounter Types by SDOH Status",
x = "Encounter Type",
y = "% of Encounters",
fill = "Has SDOH"
) +
scale_y_continuous(labels = label_percent(scale = 1)) +
scale_fill_manual(
values = c("#4ECDC4", "#FF6B6B"),
labels = c("No SDOH", "Has SDOH")
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))4.3 Cost Analysis
Show code
# Comprehensive cost comparison
cost_comparison <- patients_analysis |>
mutate(
out_of_pocket = healthcare_expenses - healthcare_coverage,
coverage_rate = healthcare_coverage / healthcare_expenses * 100
) |>
group_by(has_sdoh) |>
summarise(
n = n(),
total_expenses = sum(healthcare_expenses, na.rm = TRUE),
total_coverage = sum(healthcare_coverage, na.rm = TRUE),
total_out_of_pocket = sum(out_of_pocket, na.rm = TRUE),
avg_expenses = mean(healthcare_expenses, na.rm = TRUE),
median_expenses = median(healthcare_expenses, na.rm = TRUE),
avg_coverage = mean(healthcare_coverage, na.rm = TRUE),
avg_out_of_pocket = mean(out_of_pocket, na.rm = TRUE)
)
kable(
cost_comparison,
digits = 0,
col.names = c(
"Has SDOH",
"N",
"Total Expenses",
"Total Coverage",
"Total Out-of-Pocket",
"Avg Expenses",
"Median Expenses",
"Avg Coverage",
"Avg Out-of-Pocket"
),
format.args = list(big.mark = ","),
caption = "Healthcare Cost Comparison by SDOH Status"
)| Has SDOH | N | Total Expenses | Total Coverage | Total Out-of-Pocket | Avg Expenses | Median Expenses | Avg Coverage | Avg Out-of-Pocket |
|---|---|---|---|---|---|---|---|---|
| FALSE | 1,508 | 37,900,643 | 63,657,117 | -25,756,474 | 25,133 | 15,802 | 42,213 | -17,080 |
| TRUE | 5,343 | 1,538,876,591 | 2,481,678,592 | -942,802,001 | 288,017 | 155,638 | 464,473 | -176,456 |
Show code
# Visualize cost differences
cost_viz_data <- cost_comparison |>
select(has_sdoh, avg_expenses, avg_coverage, avg_out_of_pocket) |>
pivot_longer(
cols = -has_sdoh,
names_to = "cost_type",
values_to = "amount"
) |>
mutate(
cost_type = recode(
cost_type,
"avg_expenses" = "Total Expenses",
"avg_coverage" = "Insurance Coverage",
"avg_out_of_pocket" = "Out-of-Pocket"
)
)
ggplot(cost_viz_data, aes(x = cost_type, y = amount, fill = has_sdoh)) +
geom_col(position = "dodge") +
labs(
title = "Average Healthcare Costs by SDOH Status",
subtitle = "Comparison of total expenses, insurance coverage, and out-of-pocket costs",
x = NULL,
y = "Amount ($)",
fill = "Has SDOH"
) +
scale_y_continuous(labels = label_dollar()) +
scale_fill_manual(
values = c("#4ECDC4", "#FF6B6B"),
labels = c("No SDOH", "Has SDOH")
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Key Finding: Patients with SDOH conditions have 1046% higher average healthcare expenses.
Cost by Income Level
Show code
cost_by_income <- patients_analysis |>
filter(!is.na(income_quartile)) |>
group_by(income_quartile, has_sdoh) |>
summarise(
avg_expenses = mean(healthcare_expenses, na.rm = TRUE),
avg_out_of_pocket = mean(
healthcare_expenses - healthcare_coverage,
na.rm = TRUE
),
.groups = "drop"
)
ggplot(
cost_by_income,
aes(x = income_quartile, y = avg_expenses, color = has_sdoh, group = has_sdoh)
) +
geom_line(linewidth = 1.2) +
geom_point(size = 3) +
labs(
title = "Average Healthcare Expenses by Income Quartile and SDOH Status",
x = "Income Quartile",
y = "Average Healthcare Expenses ($)",
color = "Has SDOH"
) +
scale_y_continuous(labels = label_dollar()) +
scale_color_manual(
values = c("#4ECDC4", "#FF6B6B"),
labels = c("No SDOH", "Has SDOH")
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))4.4 Chronic Disease Correlation
Show code
# Define chronic conditions
chronic_conditions <- c(
"Essential hypertension (disorder)",
"Prediabetes (finding)",
"Chronic pain (finding)",
"Chronic sinusitis (disorder)",
"Ischemic heart disease (disorder)",
"Chronic low back pain (finding)",
"Severe anxiety (panic) (finding)",
"Anemia (disorder)"
)
# Find patients with chronic conditions
chronic_patients <- conditions |>
filter(description %in% chronic_conditions) |>
distinct(patient) |>
pull(patient)
# Cross-tabulation
patients_analysis <- patients_analysis |>
mutate(has_chronic = id %in% chronic_patients)
chronic_cross <- patients_analysis |>
group_by(has_sdoh, has_chronic) |>
summarise(n = n(), .groups = "drop") |>
group_by(has_sdoh) |>
mutate(pct = n / sum(n) * 100)
kable(
chronic_cross,
digits = 1,
col.names = c(
"Has SDOH",
"Has Chronic Condition",
"N Patients",
"% of Group"
),
format.args = list(big.mark = ","),
caption = "Prevalence of Chronic Conditions by SDOH Status"
)| Has SDOH | Has Chronic Condition | N Patients | % of Group |
|---|---|---|---|
| FALSE | FALSE | 1,370 | 90.8 |
| FALSE | TRUE | 138 | 9.2 |
| TRUE | FALSE | 810 | 15.2 |
| TRUE | TRUE | 4,533 | 84.8 |
Show code
# Calculate odds ratio
chronic_by_sdoh <- patients_analysis |>
count(has_sdoh, has_chronic) |>
group_by(has_sdoh) |>
mutate(pct = n / sum(n) * 100) |>
filter(has_chronic)
ggplot(chronic_by_sdoh, aes(x = has_sdoh, y = pct, fill = has_sdoh)) +
geom_col(width = 0.6) +
geom_text(aes(label = paste0(round(pct, 1), "%")), vjust = -0.5, size = 5) +
labs(
title = "Prevalence of Chronic Conditions by SDOH Status",
x = "Has SDOH Condition",
y = "% with Chronic Condition"
) +
scale_y_continuous(
labels = label_percent(scale = 1),
limits = c(0, max(chronic_by_sdoh$pct) * 1.15)
) +
scale_fill_manual(
values = c("#4ECDC4", "#FF6B6B"),
labels = c("No SDOH", "Has SDOH")
) +
scale_x_discrete(labels = c("No SDOH", "Has SDOH")) +
theme(legend.position = "none")Specific Chronic Condition Analysis
Show code
# Top chronic conditions by SDOH status
top_chronic <- conditions |>
filter(description %in% chronic_conditions) |>
left_join(
patients_analysis |> select(id, has_sdoh),
by = c("patient" = "id")
) |>
group_by(has_sdoh, description) |>
summarise(n = n(), .groups = "drop") |>
group_by(has_sdoh) |>
mutate(pct = n / sum(n) * 100) |>
arrange(has_sdoh, desc(n))
ggplot(
top_chronic,
aes(x = reorder(description, pct), y = pct, fill = has_sdoh)
) +
geom_col(position = "dodge") +
coord_flip() +
labs(
title = "Distribution of Chronic Conditions by SDOH Status",
x = NULL,
y = "% of Chronic Conditions",
fill = "Has SDOH"
) +
scale_y_continuous(labels = label_percent(scale = 1)) +
scale_fill_manual(
values = c("#4ECDC4", "#FF6B6B"),
labels = c("No SDOH", "Has SDOH")
)4.5 The Compounding Effect
Show code
# Count number of SDOH conditions per patient
sdoh_count <- conditions |>
filter(description %in% sdoh_conditions) |>
left_join(sdoh_categories, by = c("description" = "condition")) |>
distinct(patient, category) |>
count(patient, name = "n_sdoh_categories")
# Add to patient analysis
patients_analysis <- patients_analysis |>
left_join(sdoh_count, by = c("id" = "patient")) |>
mutate(
n_sdoh_categories = replace_na(n_sdoh_categories, 0),
sdoh_level = case_when(
n_sdoh_categories == 0 ~ "None",
n_sdoh_categories == 1 ~ "1 Category",
n_sdoh_categories == 2 ~ "2 Categories",
n_sdoh_categories >= 3 ~ "3+ Categories"
),
sdoh_level = factor(
sdoh_level,
levels = c("None", "1 Category", "2 Categories", "3+ Categories")
)
)
# Cost by number of SDOH factors
compounding_effect <- patients_analysis |>
group_by(sdoh_level) |>
summarise(
n = n(),
avg_expenses = mean(healthcare_expenses, na.rm = TRUE),
.groups = "drop"
)
# Calculate encounters per patient for each group
encounter_counts <- encounters |>
left_join(
patients_analysis |> select(id, sdoh_level),
by = c("patient" = "id")
) |>
group_by(sdoh_level, patient) |>
summarise(n_encounters = n(), .groups = "drop") |>
group_by(sdoh_level) |>
summarise(avg_encounters = mean(n_encounters))
compounding_effect <- compounding_effect |>
left_join(encounter_counts, by = "sdoh_level")
kable(
compounding_effect,
digits = 1,
col.names = c(
"SDOH Level",
"N Patients",
"Avg Expenses ($)",
"Avg Encounters"
),
format.args = list(big.mark = ","),
caption = "Compounding Effect of Multiple SDOH Factors"
)| SDOH Level | N Patients | Avg Expenses ($) | Avg Encounters |
|---|---|---|---|
| None | 1,508 | 25,133.1 | 32.8 |
| 1 Category | 131 | 86,983.7 | 65.9 |
| 2 Categories | 468 | 126,632.2 | 74.4 |
| 3+ Categories | 4,744 | 309,489.4 | 182.5 |
Show code
ggplot(
compounding_effect,
aes(x = sdoh_level, y = avg_expenses, fill = sdoh_level)
) +
geom_col() +
geom_text(
aes(label = dollar(round(avg_expenses, 0))),
vjust = -0.5,
size = 4
) +
labs(
title = "Compounding Effect: Healthcare Expenses Increase with Multiple SDOH Factors",
subtitle = "Average healthcare expenses by number of SDOH categories",
x = "Number of SDOH Categories",
y = "Average Healthcare Expenses ($)"
) +
scale_y_continuous(
labels = label_dollar(),
limits = c(0, max(compounding_effect$avg_expenses) * 1.15)
) +
scale_fill_brewer(palette = "YlOrRd") +
theme(legend.position = "none")Key Finding: Patients with 3 or more SDOH categories have 1131.4% higher healthcare expenses compared to those with no SDOH factors.
Key Findings & Discussion
Summary of Main Insights
High Prevalence: Over one-third of patients in the dataset have at least one documented social determinant of health condition, with stress and social isolation being the most common.
Significant Cost Impact: Patients with SDOH conditions incur substantially higher healthcare expenses, with average costs exceeding those without SDOH by a considerable margin.
Increased Utilization: Patients with SDOH factors have more healthcare encounters overall, with a higher proportion of emergency and urgent care visits compared to wellness visits.
Chronic Disease Link: There is a strong correlation between SDOH conditions and chronic diseases, including hypertension, chronic pain, and anxiety disorders.
Compounding Effect: The presence of multiple SDOH factors creates a multiplicative effect on healthcare costs, with patients facing multiple social challenges experiencing exponentially higher expenses.
Socioeconomic Disparities: Lower-income patients with SDOH conditions face disproportionate healthcare costs relative to their income, creating potential barriers to care.
Clinical and Policy Implications
For Healthcare Providers
- Screening Protocols: Implement systematic screening for SDOH during routine visits
- Care Coordination: Develop care pathways that address both medical and social needs
- Preventive Focus: Prioritize preventive care for high-risk patients with SDOH factors
For Policymakers
- Resource Allocation: Direct resources toward community-based interventions addressing SDOH
- Insurance Coverage: Ensure adequate coverage for services that address social needs
- Cross-Sector Collaboration: Foster partnerships between healthcare, social services, and community organizations
For Healthcare Systems
- Data Integration: Incorporate SDOH data into electronic health records
- Risk Stratification: Use SDOH information for population health management
- Value-Based Care: Design payment models that incentivize SDOH interventions
Limitations
- Causality: This analysis shows associations but cannot establish causal relationships
- Documentation Bias: SDOH conditions may be under-documented in clinical records
- Temporal Relationships: Limited ability to determine whether SDOH preceded health conditions
- Generalizability: Results are based on synthetic data and may not reflect real-world patterns
- Unmeasured Factors: Other confounding variables not captured in the dataset may influence the relationships observed
Recommendations
1. Implement Universal SDOH Screening
- Develop standardized screening tools for all patients
- Train clinical staff on trauma-informed approaches to collecting social history
- Integrate screening into routine workflows
2. Establish Community Partnerships
- Create referral networks with social service organizations
- Develop care navigation programs to connect patients with resources
- Implement community health worker programs
3. Develop Targeted Interventions
- Design programs specifically for high-risk groups (multiple SDOH factors)
- Focus on stress reduction and social connection interventions
- Address economic instability through employment and financial counseling programs
4. Enhance Data Systems
- Improve documentation of SDOH in clinical records
- Develop analytics capabilities to track SDOH impact over time
- Use predictive modeling to identify at-risk patients early
5. Advocate for Policy Change
- Support policies that address upstream social factors
- Advocate for reimbursement for SDOH screening and intervention
- Promote cross-sector funding for integrated health and social services
6. Conduct Further Research
- Longitudinal studies examining SDOH trajectories and health outcomes
- Intervention trials testing effectiveness of specific SDOH programs
- Cost-effectiveness analyses of SDOH interventions
- Qualitative research to understand patient experiences and preferences
Conclusion
This analysis demonstrates the profound impact of social determinants of health on healthcare costs and utilization. The “hidden costs” of SDOH are substantial—not only in direct healthcare expenses but also in the human toll of preventable chronic disease and health disparities. Addressing SDOH is not just a matter of social justice but also a pragmatic strategy for improving population health and controlling healthcare costs.
Healthcare systems that invest in identifying and addressing social risk factors stand to benefit from improved patient outcomes, reduced acute care utilization, and more equitable care delivery. The path forward requires commitment from clinicians, administrators, policymakers, and communities to create a healthcare system that recognizes and responds to the full spectrum of factors influencing health.