Research Paper Doctorate 2,342 words

SPSS Data Analysis American Heart

Last reviewed: February 4, 2010 ~12 min read

SPSS Data Analysis

American Heart Association Prediction of Stroke Risks

Over a ten-year study, the American Heart Association collected data on age, blood pressure level, and smoking information in order to calculate the risk of strokes within the sample population. Within the context of this study, risk is interpreted by the probability (times 100) that the patient will have a stroke over the next ten-year period. With those who smoke, there is a dummy variable assigned to correlate the data. In this case a 1 indicates a smoker, and 0 indicates a nonsmoker.

Data Set

Risk

Age

Blood Pressure

Smoker

Using the data, develop an estimated regression equation that relates the risk of a stroke to the person's age, blood pressure, and whether the person is a smoker

With the three separate independent variables representing the individual's age, blood pressure, and whether or not the smoke, the regression equation must reflect a multi-linear regression analysis. Here, the dependent variable equates to the numeric value of the risk level for each individual depending on their relation of their age, blood pressure, and smoking habits. With the regression analysis done using the data set above, the constant value equates to -93.401; each independent variable also has its own coefficient which must be used within the final regression equation. Thus, the equation goes as follows:

Y = a + b1*X1 + b2*X2 + b3*X3

And equates to the following with the constant and independent coefficients plugged into it.

Y = -93.401 + 0.98869x1 + 0.2994x2 + 6.5766x3

B. Use the regression analysis tool to obtain a complete diagnostics.

Variables Entered/Removedb

Model

Variables Entered

Variables Removed

Method

smoker paitient, blood pressure, paitient age (years)a

Enter

a. All requested variables entered.

b. Dependent Variable: Risk of stroks (%)

Model Summaryb

Model

R Square

Adjusted R. Square

Std. Error of the Estimate

.935a

.873

.850

5.75657

a. Predictors: (Constant), smoker paitient, blood pressure, paitient age (years)

b. Dependent Variable: Risk of stroks (%)

ANOVAb

Model

Sum of Squares

Mean Square

Sig.

Regression

36.823

.000a

Residual

33.138

Total

a. Predictors: (Constant), smoker paitient, blood pressure, paitient age (years)

b. Dependent Variable: Risk of stroks (%)

Coefficientsa

Model

Unstandardized Coefficients

Standardized Coefficients

Sig.

Std. Error

Beta

(Constant)

-91.759

15.223

-6.028

.000

paitient age (years)

1.077

.166

.697

6.488

.000

blood pressure

.252

.045

.553

5.568

.000

smoker paitient

8.740

3.001

.302

2.912

.010

a. Dependent Variable: Risk of stroks (%)

Casewise Diagnosticsa

Case Number

Std. Residual

Risk of stroks (%)

Predicted Value

a. Dependent Variable: Risk of stroks (%)

Residuals Statisticsa

Minimum

Maximum

Mean

Std. Deviation

Predicted Value

4.4606

54.1511

26.9500

13.88058

Std. Predicted Value

-1.620

1.960

.000

1.000

Standard Error of Predicted Value

1.903

3.532

2.538

.445

Adjusted Predicted Value

4.8474

54.2600

26.8973

13.98313

Residual

-13.10645

8.55608

.00000

5.28260

Std. Residual

-2.277

1.486

.000

.918

Stud. Residual -2.418

1.678

.004

1.016

Deleted Residual

-14.78714

10.90265

.05268

6.48651

Stud. Deleted Residual

-2.940

1.790

-.025

1.107

Mahal. Distance 1.127

6.203

2.850

1.340

Cook's Distance

.000

.193

.057

.070

Centered Leverage Value

.059

.326

.150

.071

a. Dependent Variable: Risk of stroks (%)

Curve Fit

Case Processing Summary

Total Cases

Excluded Casesa

Forecasted Cases

Newly Created Cases

a. Cases with a missing value in any variable are excluded from the analysis.

Variable Processing Summary

Variables

Dependent

Independent

paitient age (years)

blood pressure smoker paitient

Risk of stroks (%)

Number of Positive Values

Number of Zeros

Number of Negative Values

Number of Missing Values

User-Missing

System-Missing

Model Description

Model Name

MOD_1

Dependent Variable

paitient age (years)

blood pressure

smoker paitient

Equation

Linear

Independent Variable

Risk of stroks (%)

Constant

Included

Variable Whose Values Label Observations in Plots

Unspecified

Model Summary and Parameter Estimates

Dependent Variable:paitient age (years)

Equation

Model Summary

Parameter Estimates

R Square

df1

df2

Sig.

Constant

Linear

.423

13.186

.002

58.104

.421

The independent variable is Risk of stroks (%) .

C. Is smoking a significant factor in the risk of a stroke? Explain. Use a=0.05

With the regression analysis previously conducted, the factor of whether or not smoking proves to be a significant factor within the risk of a stroke can be sufficiently examined. In order to conduct this regression analysis, the following equation was used in the examination of only the smoking variable in comparison to the dependent numeric value of predicted risk of stroke.

As the graph and equation shows, there is a significant impact on risk factor if the individual smokes. Although the other independent variables, including age and blood pressure, also play a factor, smoking seems to show a significant increase in the predicted risk of a stroke within the individuals included in the data set. Thus, it can be sufficiently assumed that smoking itself is a significant signal in an increased risk factor for predicted strokes.

D. What is the probability of a stroke over the next ten years for Thompson, a 68-year-old smoker who has a blood pressure of 175?

Coefficients from SPSS Regression Analysis age.697

smoking.302

With the equation formulated earlier that computes the overall numeric risk value being Y = a + b1*X1 + b2*X2 + b3*X3, we can now begin to plug in both the computed constant and coefficients along with new independent variables of an individual not included in the original data set. The equation with the constant and coefficients included, the final equation to be used with new variable sets is Y = -93.401 + .697x1 + 0.553x2 + .302x3. Here, we must first define the variables used in the regression analysis. Variable 0 represented the age of each individual within the data set, Variable 2 represented blood pressure, and variable 3 represented smoking habits. Variable 1 is equated to the dependent variable, or numeric risk value, and so is represented as Y. Thus, with a 68-year-old man who smokes and has a blood pressure of 175, shows an equation to:

Y=-93.401 + .697 (age) + .553 (blood pressure) + .302 (smoking)

Y = -93.401 + .697 (68) + .553(175)+ .302(1)

Y=-93.401 + 47.396 + 96.775+ .302

Y=51.072

Here then, the risk level is at 51.072, and can then be rounded down to 51. The individual in question here then has a risk factor of 51 in terms of his risk for having a stroke within the next ten years, meaning that the base probability is estimated at .51072. It is clear that the man's blood pressure and smoking habits are the two independent variable factors that play the most significant role in formulating such a high risk of stroke within the next ten years in comparison to the other individuals within the original data set.

Question 2

A. Fuel Additives and Mileage

Data Table

Sample a

Sample B

17.3

18.7

18.4

17.8

19.1

21.3

16.7

18.2

22.1

18.5

18.7

17.5

19.8

20.7

20.2

Data Rank

Rank a

Rank B

8.5

Data Set

Sum

Mean

17.9571429

5.14285714

Variance

0.6795238

2.2641071

Rank Sum

Rank Mean

4.9

11.3

Combined Sum

Combined Median Rank

8.1

Testing commenced on two separate fuel additives in order to test their differing effect on the mileage of the cars. One sample included seven cars, and the other nine cars. Their mileage per gallon can then be used to determine if there is a significant difference between the two additives in terms of mileage information. The two sets of data represent comparable observations and measurable central tendencies. Both include the mileage of vehicles which were used as sampling for the fuel additives in question. Additionally, each sample test is independent of the other and the observations in each sample itself are also independent of each other. Thus, the Mann-Whitney statistical test proves a viable option to compare the two sets of data from the two different and independent fuel additives. The Mann-Whitney test allows for the observation of one sample population in regards to how it fairs in comparison to another sample population, where the variances are equal amongst both sample groups.

Thus, the following equation can be implemented within the computation of the Mann-Whitney statistical test.

UA= nanb +na (na+1) -- TA

na= 7 (critical values for U)

nb= 9

TA= the sum of the ranks of Sample a

nanb +na (na+1) = the maximum value of TA

With these values, the following computations were made, including the value of U, P (1), and P (2), which can then be analyzed to show if there is a significant difference between the two additives and how they affect the mileage rate of the vehicles they are used in.

Ranks

fuel additives (per m)

Mean Rank

Sum of Ranks

gas mileage additive 1

4.86

34.00

additive 2

11.33

Total

Test Statisticsb

gas mileage

Mann-Whitney U

6.000

Wilcoxon W

34.000

-2.701

Asymp. Sig. (2-tailed)

.007

Exact Sig. [2*(1-tailed Sig.)]

.005a

a. Not corrected for ties.

b. Grouping Variable: fuel additives (per m)

P (1)

0.004

P (2)

0.008

With the two P. values being so far apart, as well as the variance of the two groups being of significant value, around 2 whole values, it is clear that there is a significant difference to be noted between the two sample groups. Through the analysis of both the variance and the computations worked out through the Mann-Whitney test, it is clear that Sample B. has a higher rate of miles per gallon than the vehicles tested in Sample a. Here, the significant difference can then be interpreted that the fuel additive used within the context of Sample B. is more effective in terms of increased mileage within its test vehicles.

B. Exercise and Calories Burnt

Data Table

Swimming

Tennis

Cycling

Data Rank

Rank a

Rank B

Rank C

Data Set

Sum

2040

Mean

Variance

Rank Sum

Rank Mean

8.2

12.2

3.6

Combined Sum

Combined Median of Ranks

Three separate exercises were observed three times a week for forty minutes each session. The data here shows the number of calories burnt by each different activity within that context of forty minute work outs three days a week. By using the Kruskal -- Wallis test, the data can help determine if there was a significant difference between the three activities and corresponding calorie burnt data. The test itself requires a measured independent variable, and one nominal variable with one measurement variable. In the contest of this analysis, the ranked data is the set being computed. It also depends on the fact that the K. samples are random and independent, coming specifically out of a larger sample population. Additionally, all populations within the two sample sets are expected to have normal distribution and similar variances. Here the equation for analysis is as follows, with a=0.05.

SSbg (R)=n (mean of the group -- combined mean)

H= SSbg (R)

N (N=1)/12

Ranks

Activities

Mean Rank

calories burned swimming

8.20

Tennis

12.20

cycling

3.60

Total

Test Statisticsa, b calories burned

Chi-Square

9.260

Asymp. Sig.

.010

a. Kruskal Wallis Test

b. Grouping Variable: Activities

9.26

df=

0.0098

Within this data set, the sample sizes are at the 5 limit mark to create the notion that the distribution of H. is closely corresponding to the approximation of df, where df=k-1. Thus, with the computed analysis, it is clear that one sample population does show a significant difference the other two. It can be assumed that Cycling is significantly different in terms of how many calories it burns compared to the other two sample groups. It is significantly lower in terms of how many calories it burns within the context in comparison to the other sampled activities of swimming and tennis. Swimming and Tennis are much closer, with less of a significant difference between them, showing much more correlation in regards to the amount of calories burned within the workout regime setting. Based on the analysis, however, it is clear hat Tennis burns the most calories out of the two listed activities with less of a significant difference. .

Question 3

Quality of Inpatient Treatment

In thus data set, forty patients represent the sample set to be used to determine the correlation between the number of visitations and perceived quality of the care based on the opinion of the patient. The patients were divided into visitor categories, in which 1=frequent, 2=occasional, and 3=rare. Then, treatment was valued between the scale of 1=good, 2=fair, and 3=poor. A Chi-square Test was then performed on the data set to determine if there was a significant difference between the number of visits and the perceived quality of care within the given set of surveyed patients.

You’re 82% through this paper. Sign up to read the full paper.

130,000+ paper examples AI writing assistant Citation generator Cancel anytime

Cite This Paper

PaperDue. (2010). SPSS Data Analysis American Heart. PaperDue. https://www.paperdue.com/essay/spss-data-analysis-american-heart-74504

Always verify citation format against your institution’s current style guide requirements.