HEALTH DATA ANALYSIS
Public Health Data Analysis: The HIV Project
Part One: Descriptive Epidemiology
The data set selected for analysis is that of HIV. It presents the prevalence of HIV/AIDS among 359 cases by among other things, gender, ethnicity, city of residence, state, age, and sexual orientation.
Place: the highest occurrence of HIV/AIDS is reported in the City of Atlanta, which accounts for 47.5 percent of HIV/AIDS cases among the 359 cases. The second-highest occurrence is reported in College Park at 8.93 percent, followed by Alpharetta at 7.27 percent. The lowest HIV/AIDS occurrence is reported in Hapeville and Johns Creek, both of which report a prevalence rate less than 2 percent. Fig 1 below presents the frequency table showing the frequency of viral load by city of residence. The visual representation of viral load by city is presented in the pie chart labeled as Chart 1.
Fig 2: Viral Load by City of Residence
Current form: C:\\Users\\Susan\\Epi Info 7\\Projects\\HIV\\HIV.prj:Case
Record count: 359 (Deleted records excluded) Date: 19/06/2022 21:48
Frequency
Frequency variable: City
Weight variable: ViralLoad
Include missing: False
City
Frequency
Percent
Cum. Percent
Wilson 95% LCL
Wilson 95% UCL
Alpharetta
120542
7.27%
7.27%
7.23%
7.31%
Atlanta
781961.9
47.16%
54.43%
47.08%
47.23%
Chattahoochee Hills
62698.54
3.78%
58.21%
3.75%
3.81%
College Park
148125.2
8.93%
67.14%
8.89%
8.98%
East Point
69566.55
4.20%
71.34%
4.16%
4.23%
Fairburn
66910.42
4.04%
75.37%
4.01%
4.07%
Hapeville
25385.23
1.53%
76.90%
1.51%
1.55%
Johns Creek
31241.46
1.88%
78.79%
1.86%
1.90%
Milton
41332.94
2.4fi9%
81.28%
2.47%
2.52%
Mountain Park
45780.39
2.76%
84.04%
2.74%
2.79%
Palmetto
38759.7
2.34%
86.38%
2.31%
2.36%
Roswell
48467.13
2.92%
89.30%
2.90%
2.95%
Sandy Springs
80543.07
4.86%
94.16%
4.82%
4.89%
Union City
96891.95
5.84%
100.00%
5.81%
5.88%
TOTAL
1658206
100.00%
100.00%
Chart 1: Viral Load by City of Residence
Person: the viral load is higher among females at 51.62 percent, as compared to males, who report a viral rate frequency of 48.38 percent. These findings are summarized in frequency table 2 below. Figure 3 summarizes the person characteristics of the dataset by ethnic grouping. African Americans report a higher viral load frequency as compared to Asians and Alaskan Natives. This is despite the fact that whites form the biggest percentage of the sample as shown in the combined frequency table in figure 4.
Figure 2: Frequency Table of Viral Load against Gender
Frequency
Frequency variable: Sex
Weight variable: ViralLoad
Include missing: False
Sex
Frequency
Percent
Cum. Percent
Wilson 95% LCL
Wilson 95% UCL
F-Female
999086.9
51.62%
51.62%
51.55%
51.69%
M-Male
936274.4
48.38%
100.00%
48.31%
48.45%
TOTAL
1935361
100.00%
100.00%
Figure 3: Frequency Table by Grouping
Figure 5: Comparing Viral Load by American Indian/Alaskan Natives by Antigen
The means table indicates that the mean viral load among American Indian/Alaskan natives was 5,280, as compared to an average of 4,500 for non-American Indian/Alaskan natives. Thus, generally, as per the mean viral load, Indian/Alaskan natives report a higher load than the general American population.
Part Two: Analytical Epidemiology
The hypothesis developed for this part of the assignment is:
Age significantly influences HIV viral load, with younger people reporting higher viral loads
A linear regression will be used to test the above hypothesis. A linear regression is used to predict the relationship between variables and the effect of one variable (the independent variable) on another (the dependent variable) (CDC, n.d). The above hypothesis focuses on determining the degree to which age influences the HIV viral load. A linear regression is preferred to a logistic regression because the outcome (dependent) variable is a numerical, continuous variable (CDC, n.d). The logistic regression is preferred in cases where the outcome variable is binary, taking on two values, such as yes or no (CDC, n.d.). The continuous nature of the outcome variable makes a linear regression the most plausible advanced statistics test (CDC, n.d.). A logistic regression is only used in cases where the outcome variable is binary. The complex sample means test may show what age categories have the highest viral loads based on the calculated mean by viral load per age group. However, it would not show the strength of the relationship between the variables.
You’re 81% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.