Research Paper Undergraduate 1,201 words

Inferential Statistics in Hypertension Cost Analysis

~7 min read

Abstract

This paper analyzes the inferential statistical methods employed in a study estimating the true costs of hypertension using the 2005 MarketScan CCAE database. Because more than 95% of hypertensive patients carried hypertension as a secondary diagnosis, straightforward cost comparisons risked both underestimation and overestimation. To address pre-existing group imbalances—such as differences in age and sex—researchers applied propensity score matching (PSM), followed by the Pearson chi-squared test for categorical variables, the Wilcoxon two-sample test for continuous variables, and regression analysis to estimate incremental hospitalization costs. The paper explains why each test was selected, what conclusions were drawn, and why the findings, anchored by a significance threshold of p < 0.01, are considered statistically appropriate.

Key Takeaways

Introduction and Study Objective: Study aims to analyze true hypertension hospitalization costs
Why Propensity Score Matching Was Needed: Group imbalances risked biased hypertension cost estimates
How Propensity Score Matching Works: PSM uses probability to balance covariates between groups
Supporting Tests: Chi-Square, Wilcoxon, and Regression: Three tests estimate categorical, continuous, and incremental costs
Study Conclusions and Findings: Hypertension accounted for 13% of total hospitalization costs
Statistical Significance and Appropriateness of Methods: P < 0.01 threshold confirms findings are unlikely due to chance

✍️ How to write this paper — guide, tools & examples ▾

What makes this paper effective

It moves logically from problem identification (group imbalance) to method justification (PSM), showing why each statistical tool was necessary rather than simply naming it.
Concrete numbers from the study—mean ages, percentage of male patients, dollar-cost figures—are used throughout to ground abstract statistical concepts in observable results.
The paper addresses both the appropriateness of the chosen methods and the strength of their conclusions, satisfying a two-part analytical requirement with unified supporting evidence.

Key academic technique demonstrated

The paper exemplifies methodological justification: for each statistical test, the author explains not only what the test does generically but also why the specific characteristics of this dataset made that test the correct choice. This move—linking dataset features to test selection—is a hallmark of strong quantitative research critique.

Structure breakdown

The paper opens by stating the study's objective and its data source, then identifies the core measurement problem posed by secondary-diagnosis prevalence. It devotes its largest section to explaining PSM conceptually and showing its effect via before-and-after comparison data. A separate section covers the supplementary tests (chi-square and Wilcoxon) and regression. The paper then reports the study's substantive findings before closing with a discussion of statistical significance and the p < 0.01 threshold.

📘 Read the full research paper guide → Generate citations → Build an outline → Draft a literature review → 📚 More Health Care Cost examples →

Introduction and Study Objective

The objective of the study was to analyze the true costs of hypertension. The researchers did this by analyzing data from four patient groups using propensity score matching to control for possible bias in cost estimates. The regression model that followed estimated the costs of hypertension by controlling for sex, length of hospital stay, Charlson comorbidity index, region of residence, and urbanization of residence.

Researchers used the 2005 MarketScan CCAE database, which contained information about hospitalized patients belonging to more than 100 health insurance plans offered by approximately 40 employers, in order to estimate hypertension-associated hospitalization costs for patients with hypertension as a secondary diagnosis.

The core problem was that since more than 95% of the hypertensive patients in the CCAE study had hypertension as a secondary rather than primary diagnosis, hypertension-related costs would be dramatically underestimated if based only on costs incurred by patients with a primary diagnosis of hypertension. On the other hand, if cost estimates were based on total costs for all hypertension patients without accounting for complications such as age, gender, and other comorbidities, the costs attributed to hypertension might become grossly exaggerated. For this reason, the researchers used propensity score matching.

Why Propensity Score Matching Was Needed

Because the researchers wanted to isolate the true costs of hypertension, and because screening the CCAE database revealed skewed characteristics between patients with hypertension and patients without hypertension, they employed propensity score matching (PSM). Examples of distributional differences include the fact that the average age for patients with hypertension was 53 years, while the average age for patients without hypertension was 42 years. More significantly, while 48% of patients with hypertension were men, only 30% of patients without hypertension were men. These differences created potential bias in cost estimates, since differences in age and gender may affect the type of medical treatment received and accordingly skew medical costs.

PSM tests are often used in observational studies where randomization is difficult or impossible. In studies such as this one, researchers must rely on existing data from which they infer the impact—or cost, as in this case—of different types or levels of treatment based on differences among individuals receiving different care or carrying different diagnoses. Ordinarily, without these pre-existing confounding variables and when randomization can be used, a simple regression model would be sufficient to test for differences. Here, however, the inherent differences between the two groups could interfere with the regression model and confound the data.

Statistical matching could be used as an alternative, where cases are grouped to minimize differences between matched cases and poor matches are excluded, but statistical matching becomes unwieldy and impractical when the data are complex and many characteristics are involved—as was the case in this study.

How Propensity Score Matching Works

PSM essentially uses probability to predict the likelihood that an individual receives the treatment of interest. Consistent research has indicated that PSM provides a reliable measure for capturing all of the variance in the covariates needed to form and adjust between-group comparisons. Using PSM, one can match two or more groups based on this single composite variable in order to balance the results across groups.

As Table 1 demonstrates, after PSM was applied the proportion of males among patients with and without hypertension converged considerably—47.9% with hypertension versus 48.1% without hypertension—compared to the original difference of 48% versus 30%. This illustrates how PSM effectively reduced the pre-existing group imbalance and brought the comparison groups into much closer alignment on key demographic characteristics.

Supporting Tests: Chi-Square, Wilcoxon, and Regression

The Pearson chi-squared test was used for categorical variables (such as gender), and the Wilcoxon two-sample test was used for continuous variables (such as duration of hospital stay). Regression was then used to estimate the incremental cost of hypertension in the matched populations.

The chi-square test is used to examine the possibility of a relationship between two variables (a test of independence). It is also used as a test of goodness of fit—that is, to establish whether an observed frequency distribution differs from a theoretical distribution. The Wilcoxon test is a non-parametric test used when comparing two related samples in order to assess whether their population means differ. It is appropriate here because continuous variables such as length of hospital stay may not follow a normal distribution, making a non-parametric approach preferable.

Regression analysis describes how a dependent variable changes as it is affected—or not affected—by one or more independent variables. Here, regression was performed to determine whether differences existed between hypertension-associated hospitalization costs for patients with hypertension as a secondary diagnosis and those with hypertension as a primary diagnosis. Hypertension served as the independent variable, while changes in medical costs served as the dependent variable. Researchers tested whether any measurable alterations in cost existed between the two groups.

2 locked sections · 290 words

Study Conclusions and Findings170 words

Table 1 shows that PSM dramatically improved compatibility between patients with and without hypertension. Before matching, the mean age of patients with hypertension was 53.2…

Statistical Significance and Appropriateness of Methods120 words

Researchers chose a p-value threshold of p < 0.01. This indicates, for instance, that for the entire patient population, the…

Read the full paper →

Plus 130,000+ examples & all writing tools

Key Concepts in This Paper

Propensity Score Matching Hypertension Costs Chi-Square Test Wilcoxon Test Regression Analysis Observational Study Comorbidity Index Secondary Diagnosis Cost Estimation Statistical Significance