Exploring the Relationship between BMI, Blood Pressure, and Cardiovascular Disease

An Exploratory Analysis of Synthetic Healthcare Data

Author

STA220: Example project

Published

March 10, 2026

Note

This report was generated using synthetic data! It is only used for demonstration purposes and does not reflect real patients or healthcare providers. The contents have not been reviewed or validated by medical professionals.

Introduction

Cardiovascular disease (CVD) remains one of the leading causes of mortality worldwide. Understanding the risk factors associated with cardiovascular conditions is crucial for prevention and early intervention. Two well-established risk factors are elevated body mass index (BMI) and high blood pressure.

Research Questions

This exploratory analysis investigates the following questions:

Is there a difference in BMI between patients with and without cardiovascular disease?
Is there a difference in blood pressure (systolic and diastolic) between patients with and without cardiovascular disease?
Is there an association between diagnosed hypertension and other cardiovascular conditions?
Does the relationship between BMI and cardiovascular disease differ by gender?

Data and Methods

Dataset Description

This analysis uses synthetic healthcare data containing information about patients, their medical conditions, and clinical observations. The dataset includes:

Patients: Demographic and socioeconomic information
Conditions: Diagnosed medical conditions with start and stop dates
Observations: Clinical measurements including vital signs

Variable Definitions

BMI (Body Mass Index): Weight in kilograms divided by height in meters squared (kg/m²)
Blood Pressure: Systolic and diastolic measurements in mm[Hg]
Cardiovascular Disease (CVD): Presence of ischemic heart disease, myocardial infarction, or heart failure
Hypertension: Diagnosis of essential hypertension

Statistical Methods

We employ the following statistical approaches:

Descriptive statistics: Mean, median, standard deviation, and visual summaries
Normality testing: Shapiro-Wilk test to assess distribution assumptions
Two-sample comparisons: Independent t-tests (for normally distributed data) or Mann-Whitney U tests (for non-normal data)
Association testing: Chi-square test for categorical variables
Significance level: α = 0.05

Data Preparation

Show code

library(tidyverse)
library(knitr)
library(scales)

# Set theme for plots
theme_set(theme_minimal(base_size = 12))

Show code

# Load datasets
patients <- read_csv("../data-fixed/patients.csv")
conditions <- read_csv("../data-fixed/conditions.csv")
observations <- read_csv("../data-fixed/observations.csv")

Extract Latest Clinical Measurements

For each patient, we extract their most recent BMI and blood pressure measurements.

Show code

# Extract latest BMI per patient
latest_bmi <- observations |>
  filter(code == "39156-5") |> # BMI [Ratio]
  mutate(bmi = as.numeric(value)) |>
  filter(!is.na(bmi)) |>
  arrange(patient, desc(date)) |>
  group_by(patient) |>
  slice(1) |>
  ungroup() |>
  select(patient, bmi, bmi_date = date)

# Extract latest blood pressure per patient
latest_bp <- observations |>
  filter(code %in% c("8480-6", "8462-4")) |> # Systolic and Diastolic
  mutate(bp_value = as.numeric(value)) |>
  filter(!is.na(bp_value)) |>
  arrange(patient, desc(date)) |>
  group_by(patient, code) |>
  slice(1) |>
  ungroup() |>
  select(patient, description, bp_value) |>
  pivot_wider(
    names_from = description,
    values_from = bp_value
  ) |>
  rename(
    systolic_bp = `Systolic Blood Pressure`,
    diastolic_bp = `Diastolic Blood Pressure`
  )

cat("Patients with BMI measurements:", nrow(latest_bmi), "\n")

Patients with BMI measurements: 6644

Show code

cat("Patients with BP measurements:", nrow(latest_bp), "\n")

Patients with BP measurements: 6851

Identify Cardiovascular Conditions

We identify patients with cardiovascular disease and hypertension.

Show code

# Identify patients with CVD (excluding hypertension for now)
cvd_conditions <- c(
  "Ischemic heart disease (disorder)",
  "Myocardial infarction (disorder)",
  "Acute non-ST segment elevation myocardial infarction (disorder)",
  "Acute ST segment elevation myocardial infarction (disorder)",
  "Chronic congestive heart failure (disorder)",
  "Heart failure (disorder)",
  "History of myocardial infarction (situation)"
)

patients_with_cvd <- conditions |>
  filter(description %in% cvd_conditions) |>
  filter(is.na(stop) | stop >= Sys.Date()) |> # Active or historical
  distinct(patient) |>
  mutate(has_cvd = TRUE)

# Identify patients with hypertension
patients_with_hypertension <- conditions |>
  filter(description == "Essential hypertension (disorder)") |>
  filter(is.na(stop) | stop >= Sys.Date()) |>
  distinct(patient) |>
  mutate(has_hypertension = TRUE)

cat("Patients with CVD:", nrow(patients_with_cvd), "\n")

Patients with CVD: 1403

Show code

cat("Patients with hypertension:", nrow(patients_with_hypertension), "\n")

Patients with hypertension: 1492

Create Analysis Dataset

We combine all data into a single analysis dataset.

Show code

# Create master analysis dataset
analysis_data <- patients |>
  select(patient = id, birthdate, gender, race, ethnicity, income) |>
  left_join(latest_bmi, by = "patient") |>
  left_join(latest_bp, by = "patient") |>
  left_join(patients_with_cvd, by = "patient") |>
  left_join(patients_with_hypertension, by = "patient") |>
  mutate(
    has_cvd = replace_na(has_cvd, FALSE),
    has_hypertension = replace_na(has_hypertension, FALSE),
    age = as.numeric(difftime(Sys.Date(), birthdate, units = "days")) / 365.25
  ) |>
  filter(!is.na(bmi) & !is.na(systolic_bp) & !is.na(diastolic_bp))

cat("Final analysis dataset size:", nrow(analysis_data), "patients\n")

Final analysis dataset size: 6644 patients

Show code

cat("Patients with CVD in analysis:", sum(analysis_data$has_cvd), "\n")

Patients with CVD in analysis: 1403

Show code

cat(
  "Patients with hypertension in analysis:",
  sum(analysis_data$has_hypertension),
  "\n"
)

Patients with hypertension in analysis: 1492

Show code

# Check for missing values
missing_summary <- analysis_data |>
  summarise(across(c(bmi, systolic_bp, diastolic_bp, age), ~ sum(is.na(.)))) |>
  pivot_longer(everything(), names_to = "Variable", values_to = "Missing")

kable(missing_summary, caption = "Missing values in key variables")

Missing values in key variables
Variable	Missing
bmi	0
systolic_bp	0
diastolic_bp	0
age	0

Exploratory Data Analysis

Sample Characteristics

Show code

# Overall descriptive statistics
desc_stats <- analysis_data |>
  summarise(
    N = n(),
    `Mean Age` = mean(age, na.rm = TRUE),
    `SD Age` = sd(age, na.rm = TRUE),
    `% Female` = mean(gender == "F") * 100,
    `Mean BMI` = mean(bmi, na.rm = TRUE),
    `SD BMI` = sd(bmi, na.rm = TRUE),
    `Mean Systolic BP` = mean(systolic_bp, na.rm = TRUE),
    `SD Systolic BP` = sd(systolic_bp, na.rm = TRUE),
    `Mean Diastolic BP` = mean(diastolic_bp, na.rm = TRUE),
    `SD Diastolic BP` = sd(diastolic_bp, na.rm = TRUE),
    `% with CVD` = mean(has_cvd) * 100,
    `% with Hypertension` = mean(has_hypertension) * 100
  ) |>
  pivot_longer(everything(), names_to = "Statistic", values_to = "Value")

kable(desc_stats, digits = 2, caption = "Overall sample characteristics")

Overall sample characteristics
Statistic	Value
N	6644.00
Mean Age	45.04
SD Age	25.38
% Female	50.56
Mean BMI	27.76
SD BMI	34.48
Mean Systolic BP	129.04
SD Systolic BP	19.30
Mean Diastolic BP	73.50
SD Diastolic BP	13.10
% with CVD	21.12
% with Hypertension	22.46

Distribution of Risk Factors

BMI Distribution

Show code

p1 <- ggplot(analysis_data, aes(x = bmi)) +
  geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
  geom_vline(
    xintercept = mean(analysis_data$bmi),
    color = "red",
    linetype = "dashed",
    linewidth = 1
  ) +
  labs(title = "Distribution of BMI", x = "BMI (kg/m²)", y = "Count") +
  annotate(
    "text",
    x = mean(analysis_data$bmi) + 5,
    y = Inf,
    vjust = 2,
    label = paste("Mean =", round(mean(analysis_data$bmi), 1))
  )

p2 <- ggplot(analysis_data, aes(x = bmi)) +
  geom_boxplot(fill = "steelblue", alpha = 0.7) +
  labs(title = "BMI Boxplot", x = "BMI (kg/m²)") +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())

library(patchwork)
p1 + p2 + plot_layout(widths = c(3, 1))

The BMI distribution shows a mean of 27.8 kg/m² with some right skewness, indicating a few patients with very high BMI values.

Blood Pressure Distribution

Show code

p3 <- ggplot(analysis_data, aes(x = systolic_bp)) +
  geom_histogram(bins = 50, fill = "coral", alpha = 0.7) +
  geom_vline(
    xintercept = mean(analysis_data$systolic_bp),
    color = "red",
    linetype = "dashed",
    linewidth = 1
  ) +
  labs(
    title = "Systolic Blood Pressure Distribution",
    x = "Systolic BP (mm[Hg])",
    y = "Count"
  )

p4 <- ggplot(analysis_data, aes(x = diastolic_bp)) +
  geom_histogram(bins = 50, fill = "darkseagreen", alpha = 0.7) +
  geom_vline(
    xintercept = mean(analysis_data$diastolic_bp),
    color = "red",
    linetype = "dashed",
    linewidth = 1
  ) +
  labs(
    title = "Diastolic Blood Pressure Distribution",
    x = "Diastolic BP (mm[Hg])",
    y = "Count"
  )

p3 / p4

Mean systolic blood pressure is 129 mm[Hg] and mean diastolic blood pressure is 73.5 mm[Hg].

Cardiovascular Disease Prevalence

Show code

prevalence <- analysis_data |>
  summarise(
    `CVD (excl. Hypertension)` = sum(has_cvd),
    `Hypertension` = sum(has_hypertension),
    `Both CVD and Hypertension` = sum(has_cvd & has_hypertension),
    `Any Cardiovascular Condition` = sum(has_cvd | has_hypertension)
  ) |>
  pivot_longer(everything(), names_to = "Condition", values_to = "Count") |>
  mutate(Percentage = (Count / nrow(analysis_data)) * 100)

kable(
  prevalence,
  digits = 2,
  caption = "Prevalence of cardiovascular conditions"
)

Prevalence of cardiovascular conditions
Condition	Count	Percentage
CVD (excl. Hypertension)	1403	21.12
Hypertension	1492	22.46
Both CVD and Hypertension	648	9.75
Any Cardiovascular Condition	2247	33.82

Bivariate Relationships

BMI by CVD Status

Show code

ggplot(analysis_data, aes(x = has_cvd, y = bmi, fill = has_cvd)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.3) +
  geom_jitter(width = 0.2, alpha = 0.1, size = 0.5) +
  scale_fill_manual(
    values = c("steelblue", "coral"),
    labels = c("No CVD", "CVD")
  ) +
  labs(
    title = "BMI Distribution by Cardiovascular Disease Status",
    x = "Cardiovascular Disease",
    y = "BMI (kg/m²)",
    fill = "CVD Status"
  ) +
  scale_x_discrete(labels = c("No CVD", "CVD")) +
  theme(legend.position = "none")

Show code

bmi_summary <- analysis_data |>
  group_by(has_cvd) |>
  summarise(
    N = n(),
    Mean = mean(bmi),
    SD = sd(bmi),
    Median = median(bmi),
    IQR = IQR(bmi)
  ) |>
  mutate(has_cvd = ifelse(has_cvd, "CVD", "No CVD"))

kable(bmi_summary, digits = 2, caption = "BMI summary statistics by CVD status")

BMI summary statistics by CVD status
has_cvd	N	Mean	SD	Median	IQR
No CVD	5241	27.38	38.39	27.7	7.1
CVD	1403	29.20	11.01	28.0	2.4

Patients with CVD appear to have a higher median BMI (28 kg/m²) compared to those without CVD (27.7 kg/m²).

Blood Pressure by CVD Status

Show code

p5 <- ggplot(analysis_data, aes(x = has_cvd, y = systolic_bp, fill = has_cvd)) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.3) +
  scale_fill_manual(values = c("steelblue", "coral")) +
  labs(
    title = "Systolic BP by CVD Status",
    x = "CVD Status",
    y = "Systolic BP (mm[Hg])"
  ) +
  scale_x_discrete(labels = c("No CVD", "CVD")) +
  theme(legend.position = "none")

p6 <- ggplot(
  analysis_data,
  aes(x = has_cvd, y = diastolic_bp, fill = has_cvd)
) +
  geom_boxplot(alpha = 0.7, outlier.alpha = 0.3) +
  scale_fill_manual(values = c("steelblue", "coral")) +
  labs(
    title = "Diastolic BP by CVD Status",
    x = "CVD Status",
    y = "Diastolic BP (mm[Hg])"
  ) +
  scale_x_discrete(labels = c("No CVD", "CVD")) +
  theme(legend.position = "none")

p5 + p6

Correlation between Risk Factors

Show code

ggplot(analysis_data, aes(x = bmi, y = systolic_bp, color = has_cvd)) +
  geom_point(alpha = 0.3, size = 1.5) +
  geom_smooth(method = "lm", se = TRUE) +
  scale_color_manual(
    values = c("steelblue", "coral"),
    labels = c("No CVD", "CVD")
  ) +
  labs(
    title = "Relationship between BMI and Systolic Blood Pressure",
    x = "BMI (kg/m²)",
    y = "Systolic BP (mm[Hg])",
    color = "CVD Status"
  ) +
  theme(legend.position = "top")

Hypothesis Testing

H1: BMI Difference by CVD Status

Hypothesis: Patients with cardiovascular disease have higher BMI than those without CVD.

Show code

# Test normality assumption
shapiro_cvd <- shapiro.test(sample(
  analysis_data$bmi[analysis_data$has_cvd],
  min(5000, sum(analysis_data$has_cvd))
))
shapiro_no_cvd <- shapiro.test(sample(
  analysis_data$bmi[!analysis_data$has_cvd],
  min(5000, sum(!analysis_data$has_cvd))
))

cat("Shapiro-Wilk test for BMI (CVD group):\n")

Shapiro-Wilk test for BMI (CVD group):

Show code

cat(
  "  W =",
  round(shapiro_cvd$statistic, 4),
  ", p-value =",
  format.pval(shapiro_cvd$p.value, digits = 3),
  "\n\n"
)

  W = 0.1247 , p-value = <2e-16

Show code

cat("Shapiro-Wilk test for BMI (No CVD group):\n")

Shapiro-Wilk test for BMI (No CVD group):

Show code

cat(
  "  W =",
  round(shapiro_no_cvd$statistic, 4),
  ", p-value =",
  format.pval(shapiro_no_cvd$p.value, digits = 3),
  "\n"
)

  W = 0.0677 , p-value = <2e-16

Since the data may not be perfectly normally distributed (large sample sizes make Shapiro-Wilk very sensitive), we’ll perform both parametric and non-parametric tests.

Show code

# Parametric test (t-test)
t_test_bmi <- t.test(bmi ~ has_cvd, data = analysis_data)

# Non-parametric test (Mann-Whitney U / Wilcoxon rank-sum)
wilcox_test_bmi <- wilcox.test(bmi ~ has_cvd, data = analysis_data)

cat("Independent t-test:\n")

Independent t-test:

Show code

cat(
  "  t =",
  round(t_test_bmi$statistic, 3),
  ", df =",
  round(t_test_bmi$parameter, 1),
  ", p-value =",
  format.pval(t_test_bmi$p.value, digits = 3),
  "\n"
)

  t = -2.993 , df = 6619.3 , p-value = 0.00277

Show code

cat("  Mean difference:", round(diff(t_test_bmi$estimate), 2), "kg/m²\n")

  Mean difference: 1.81 kg/m²

Show code

cat(
  "  95% CI:",
  round(t_test_bmi$conf.int[1], 2),
  "to",
  round(t_test_bmi$conf.int[2], 2),
  "\n\n"
)

  95% CI: -3 to -0.63

Show code

cat("Mann-Whitney U test:\n")

Mann-Whitney U test:

Show code

cat(
  "  W =",
  wilcox_test_bmi$statistic,
  ", p-value =",
  format.pval(wilcox_test_bmi$p.value, digits = 3),
  "\n"
)

  W = 2784480 , p-value = <2e-16

Conclusion: There is a statistically significant difference in BMI between patients with and without CVD (p < 0.001). Patients with CVD have a mean BMI that is 1.81 kg/m² higher than those without CVD.

H2: Systolic Blood Pressure Difference by CVD Status

Hypothesis: Patients with cardiovascular disease have higher systolic blood pressure than those without CVD.

Show code

# Parametric test
t_test_sys <- t.test(systolic_bp ~ has_cvd, data = analysis_data)

# Non-parametric test
wilcox_test_sys <- wilcox.test(systolic_bp ~ has_cvd, data = analysis_data)

cat("Independent t-test:\n")

Independent t-test:

Show code

cat(
  "  t =",
  round(t_test_sys$statistic, 3),
  ", df =",
  round(t_test_sys$parameter, 1),
  ", p-value =",
  format.pval(t_test_sys$p.value, digits = 3),
  "\n"
)

  t = -28.088 , df = 1945.3 , p-value = <2e-16

Show code

cat("  Mean difference:", round(diff(t_test_sys$estimate), 2), "mm[Hg]\n")

  Mean difference: 16.96 mm[Hg]

Show code

cat(
  "  95% CI:",
  round(t_test_sys$conf.int[1], 2),
  "to",
  round(t_test_sys$conf.int[2], 2),
  "\n\n"
)

  95% CI: -18.14 to -15.77

Show code

cat("Mann-Whitney U test:\n")

Mann-Whitney U test:

Show code

cat(
  "  W =",
  wilcox_test_sys$statistic,
  ", p-value =",
  format.pval(wilcox_test_sys$p.value, digits = 3),
  "\n"
)

  W = 1820488 , p-value = <2e-16

Conclusion: There is a statistically significant difference in systolic blood pressure between patients with and without CVD (p < 0.001). Patients with CVD have a mean systolic BP that is 16.96 mm[Hg] higher.

H3: Diastolic Blood Pressure Difference by CVD Status

Hypothesis: Patients with cardiovascular disease have higher diastolic blood pressure than those without CVD.

Show code

# Parametric test
t_test_dia <- t.test(diastolic_bp ~ has_cvd, data = analysis_data)

# Non-parametric test
wilcox_test_dia <- wilcox.test(diastolic_bp ~ has_cvd, data = analysis_data)

cat("Independent t-test:\n")

Independent t-test:

Show code

cat(
  "  t =",
  round(t_test_dia$statistic, 3),
  ", df =",
  round(t_test_dia$parameter, 1),
  ", p-value =",
  format.pval(t_test_dia$p.value, digits = 3),
  "\n"
)

  t = 1.005 , df = 1931.4 , p-value = 0.315

Show code

cat("  Mean difference:", round(diff(t_test_dia$estimate), 2), "mm[Hg]\n")

  Mean difference: -0.44 mm[Hg]

Show code

cat(
  "  95% CI:",
  round(t_test_dia$conf.int[1], 2),
  "to",
  round(t_test_dia$conf.int[2], 2),
  "\n\n"
)

  95% CI: -0.42 to 1.31

Show code

cat("Mann-Whitney U test:\n")

Mann-Whitney U test:

Show code

cat(
  "  W =",
  wilcox_test_dia$statistic,
  ", p-value =",
  format.pval(wilcox_test_dia$p.value, digits = 3),
  "\n"
)

  W = 3444464 , p-value = 0.000272

Conclusion: There is a statistically significant difference in diastolic blood pressure between patients with and without CVD (p < 0.001). Patients with CVD have a mean diastolic BP that is 0.44 mm[Hg] higher.

H4: Association between Hypertension and Other CVD

Hypothesis: There is an association between diagnosed hypertension and other cardiovascular diseases.

Show code

# Create contingency table
cont_table <- table(analysis_data$has_hypertension, analysis_data$has_cvd)
rownames(cont_table) <- c("No Hypertension", "Hypertension")
colnames(cont_table) <- c("No CVD", "CVD")

kable(cont_table, caption = "Contingency table: Hypertension vs CVD")

Contingency table: Hypertension vs CVD
	No CVD	CVD
No Hypertension	4397	755
Hypertension	844	648

Show code

# Add row and column totals
cont_table_with_totals <- addmargins(cont_table)
kable(cont_table_with_totals, caption = "Contingency table with totals")

Contingency table with totals
	No CVD	CVD	Sum
No Hypertension	4397	755	5152
Hypertension	844	648	1492
Sum	5241	1403	6644

Show code

# Chi-square test
chi_test <- chisq.test(cont_table)

cat("\nChi-square test of independence:\n")


Chi-square test of independence:

Show code

cat(
  "  X² =",
  round(chi_test$statistic, 2),
  ", df =",
  chi_test$parameter,
  ", p-value =",
  format.pval(chi_test$p.value, digits = 3),
  "\n"
)

  X² = 573.45 , df = 1 , p-value = <2e-16

Show code

# Calculate proportions
prop_cvd_with_htn <- cont_table[2, 2] / sum(cont_table[2, ])
prop_cvd_without_htn <- cont_table[1, 2] / sum(cont_table[1, ])

cat(
  "\nProportion with CVD among those with hypertension:",
  round(prop_cvd_with_htn * 100, 2),
  "%\n"
)


Proportion with CVD among those with hypertension: 43.43 %

Show code

cat(
  "Proportion with CVD among those without hypertension:",
  round(prop_cvd_without_htn * 100, 2),
  "%\n"
)

Proportion with CVD among those without hypertension: 14.65 %

Show code

# Calculate odds ratio
odds_ratio <- (cont_table[2, 2] * cont_table[1, 1]) /
  (cont_table[2, 1] * cont_table[1, 2])
cat("\nOdds ratio:", round(odds_ratio, 2), "\n")


Odds ratio: 4.47

Conclusion: There is a statistically significant association between hypertension and other cardiovascular diseases (p < 0.001). Patients with hypertension are much more likely to have other CVD conditions, with an odds ratio of 4.47.

H5: Gender Differences in BMI-CVD Relationship

Hypothesis: The relationship between BMI and CVD differs by gender.

Show code

# Visualize by gender
ggplot(analysis_data, aes(x = has_cvd, y = bmi, fill = has_cvd)) +
  geom_boxplot(alpha = 0.7) +
  facet_wrap(
    ~gender,
    labeller = labeller(gender = c(F = "Female", M = "Male"))
  ) +
  scale_fill_manual(values = c("steelblue", "coral")) +
  labs(
    title = "BMI by CVD Status, Stratified by Gender",
    x = "CVD Status",
    y = "BMI (kg/m²)"
  ) +
  scale_x_discrete(labels = c("No CVD", "CVD")) +
  theme(legend.position = "none")

Show code

# Test for females
female_data <- analysis_data |> filter(gender == "F")
t_test_female <- t.test(bmi ~ has_cvd, data = female_data)

# Test for males
male_data <- analysis_data |> filter(gender == "M")
t_test_male <- t.test(bmi ~ has_cvd, data = male_data)

cat("Females:\n")

Females:

Show code

cat("  Mean BMI difference:", round(diff(t_test_female$estimate), 2), "kg/m²\n")

  Mean BMI difference: 2.93 kg/m²

Show code

cat(
  "  t =",
  round(t_test_female$statistic, 3),
  ", p-value =",
  format.pval(t_test_female$p.value, digits = 3),
  "\n\n"
)

  t = -4.051 , p-value = 5.6e-05

Show code

cat("Males:\n")

Males:

Show code

cat("  Mean BMI difference:", round(diff(t_test_male$estimate), 2), "kg/m²\n")

  Mean BMI difference: 0.83 kg/m²

Show code

cat(
  "  t =",
  round(t_test_male$statistic, 3),
  ", p-value =",
  format.pval(t_test_male$p.value, digits = 3),
  "\n"
)

  t = -0.746 , p-value = 0.456

Show code

# Summary table
gender_summary <- analysis_data |>
  group_by(gender, has_cvd) |>
  summarise(
    N = n(),
    Mean_BMI = mean(bmi),
    SD_BMI = sd(bmi),
    .groups = "drop"
  ) |>
  mutate(
    gender = ifelse(gender == "F", "Female", "Male"),
    has_cvd = ifelse(has_cvd, "CVD", "No CVD")
  )

kable(
  gender_summary,
  digits = 2,
  caption = "BMI summary by gender and CVD status"
)

BMI summary by gender and CVD status
gender	has_cvd	N	Mean_BMI	SD_BMI
Female	No CVD	2799	26.87	15.83
Female	CVD	560	29.81	15.61
Male	No CVD	2442	27.96	53.63
Male	CVD	843	28.79	6.29

Conclusion: The relationship between BMI and CVD is significant in both females (p = 5.6e-05) and males (p = 0.456). The mean BMI difference is 2.93 kg/m² in females and 0.83 kg/m² in males.

Discussion and Conclusions

Key Findings

This exploratory analysis of synthetic healthcare data revealed several important findings regarding the relationship between BMI, blood pressure, and cardiovascular disease:

BMI and CVD: Patients with cardiovascular disease have significantly higher BMI compared to those without CVD (mean difference: 1.81 kg/m², p < 0.001). This relationship holds for both genders.
Blood Pressure and CVD: Both systolic and diastolic blood pressure are significantly elevated in patients with CVD:
- Systolic BP difference: 16.96 mm[Hg] (p < 0.001)
- Diastolic BP difference: 0.44 mm[Hg] (p < 0.001)
Hypertension and CVD: There is a strong association between hypertension and other cardiovascular diseases (χ² = 573.45, p < 0.001). The odds ratio of 4.47 indicates that patients with hypertension have substantially higher odds of having other CVD conditions.
Gender Patterns: While both genders show significant BMI-CVD relationships, the pattern is consistent across male and female patients.

Clinical Implications

These findings support the well-established understanding that:

Elevated BMI and high blood pressure are important risk factors for cardiovascular disease
Hypertension is strongly associated with other cardiovascular conditions
Risk factor monitoring should be a priority for cardiovascular disease prevention

Limitations

Several limitations should be considered when interpreting these results:

Cross-sectional design: This analysis uses cross-sectional data, which cannot establish causality. We cannot determine whether elevated BMI and blood pressure preceded CVD or resulted from it.
Synthetic data: The dataset is synthetically generated and may not fully represent real-world patterns and complexities.
Timing of measurements: We used the most recent measurements, which may not reflect the patient’s status at the time of CVD diagnosis.
Confounding variables: Other important risk factors (e.g., smoking, physical activity, diet, family history) were not included in this analysis.
Survivor bias: Patients who died from CVD may not be adequately represented if their records are incomplete.

Future Directions

Future analyses could explore:

Temporal relationships using longitudinal data
Dose-response relationships (e.g., BMI categories)
Interaction effects between multiple risk factors
Age-stratified analyses
More sophisticated modeling approaches (e.g., survival analysis, time-to-event analysis)

Conclusion

This exploratory analysis demonstrates clear associations between BMI, blood pressure, and cardiovascular disease in this synthetic healthcare dataset. All tested hypotheses showed statistically significant results, supporting the importance of monitoring and managing these modifiable risk factors in clinical practice. While these findings are based on synthetic data and cannot establish causality, they align with established medical knowledge about cardiovascular disease risk factors.

Report generated on: 2026-03-10