BIOSTATISTICS 2 NOTES ABOUT DATA: 2019 BRFSS SPSS Data File.sav These data are from the Behavioral Risk Factor Surveillance System (BRFSS). The BRFSS collects state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services. Be sure to read the Background section of the 2019...
BIOSTATISTICS 2
NOTES ABOUT DATA:
• 2019 BRFSS SPSS Data File.sav – These data are from the Behavioral Risk Factor Surveillance System (BRFSS). The BRFSS collects “state data about U.S. residents regarding their health-related risk behaviors, chronic health conditions, and use of preventive services.” Be sure to read the Background section of the 2019 BRFSS Overview for more details so you get a little better idea of the BRFSS and how the data are used. Investigators all over the country use these data to conduct research about many different characteristics and how they affect health outcomes. The data file for this project is not the complete data set. There are over 250 variables in the complete data set. I narrowed it down to the few variables I want you to use for this project and simplified coding for the sake of your sanity and to best demonstrate your learning of concepts.
INSTRUCTIONS: (Please read each question thoroughly)
You are a statistician who is tasked with helping a researcher who is interested in determining what characteristics influence a person to report poor health. Using the BRFSS, the researcher find that there are a few variables that can help her answer that question. She first asks if you can conduct some analyses to determine what characteristics predict someone reporting that they had at least one day in the last day where they reported poor physical health (PHYSHLTH_YES_NO) in the last 30 days. In addition, for those who reported at least one day of poor physical health, she is also interested in determining what influences the reported number of days of poor physical health (PHYSHLTH_DAYS). Among other characteristics, the investigator is primarily interested in determining whether veteran status (variable name: VETERAN), and adverse childhood experiences (ACES) (variable name: ACES_Score) influence these two dependent variables. Because there are a range of confounding variables to consider, the researcher also collected data about sex, health insurance, marital status, education, home ownership, income, age, smoking, alcohol use, and exercise, among others. Your job is to help the researcher answer her research questions.
1. Using the graphing options in SPSS, choose two appropriate graphical display options to describe PHYSHLTH_DAYS. You should be able to describe whether this variable is normally distributed, and whether there are outliers in the data using the two display options you choose. Copy and paste your graphs/charts below and for each, provide an interpretation of the graph, and explain why you chose that option.
The selected graphical options are the histogram with normality curve and dot plot.
The Histogram with Normality Curve
Figure 1.1 above shows a histogram with a normal distribution curve. The histogram was selected because it provides a view of the central tendency, spread, and shape of the data set, including the presence of outliers. By showing the shape of the dataset, the histogram will provide an at-a-glance view of whether or not the dataset presents a normal distribution. The dataset presents a normal distribution as evidenced by the single-peaked bell-shaped normality curve, with observations spread out symmetrically around the mean. No outliers are evident from the distribution.
Figure 1.1
The Dot Plot
Figure 2.1 above presents a dot plot. The dot plot, like the histogram, presents a view of the frequency distributions of the different data points in the dataset. However, the dot plot provides information on the frequency of individual values, and not a range of values like the histogram. The dots appear as complete bars due to the large number of values attached to each data point. Longer bars represent higher frequencies. Thus, since it focuses on individual data points, the dot plot provides a more effective way of assessing whether outliers exist in the data set than the histogram. Outliers are data points that can be termed either extremely high or extremely low as compared to the rest of the data point or the nearest data point. The dot plot shows that there are no outliers in the data set.
Figure 2.1
2. The variable PHYSHLTH_YES_NO is a categorical, binary, nominal variable (Either people report poor physical health (Yes=1), or they do not (No=0)). Based on this categorical variable, use the appropriate statistical test to determine if there is a difference in ACES_Score, and ALCOHOL between the groups who report poor physical health. You will be doing two hypothesis tests…one for ACES_Score, and one for ALCOHOL. For each test, conduct a formal hypothesis test to answer this question (choose the appropriate statistical test, explain why you chose it, write out your null and alternative hypotheses, run the test, and interpret the results). Include appropriate output from SPSS to show what you did.
To test whether there is a difference in ACES score between the two groups: YES and NO, the independent samples t-test will be used. The independent samples t-test will answer this question by comparing the means of the two independent groups in regard to the ACES score to determine whether the ACES mean score for the group that reports YES (poor physical health) differs significantly from that which reports NO (good physical health). The independent samples t-test is appropriate because the data meets the following requirements: i) The dependent variable ACES score is a continuous ratio variable, ii) the independent variable PHYSHLTH_YES_NO is a categorical variable with only two categories (Yes and No), and iii) the groups or categories are independent and hence, a participant cannot be in both groups. The null and alternative hypotheses for the independent samples t-test are:
H0: ACES SCOREYES – ACES SCORENO = 0 (the difference of the means is equal to 0)
H1: ACES SCOREYES – ACES SCORENO ? 0 (the difference of the means is not equal to 0)
Before running the t-test it is advisable to run a comparison box plot to obtain an idea of what to expect in the test. The box plot is as presented below. If the means/variances of the two groups or categories in regard to ACES score were equal, the box plots would have equal lengths.
Figure 2.1
From the box plots in figure 2.1, it is evident that the variances for the two categories are quite different as the spread of observations for the YES category is greater than that of the NO category. This suggests that the two groups or categories differ by ACES score. The next step is to run the independent samples t-test to check whether the difference between the groups is significant. Results of the t-test are presented in tables 2.1 and 2.2 below:
The remaining sections cover Conclusions. Subscribe for $1 to unlock the full paper, plus 130,000+ paper examples and the PaperDue AI writing assistant — all included.
Always verify citation format against your institution's current style guide.