Investigate Distributions with Numbers Part 1 (10 points) 1. Describe three different ways to measure the center of a data set. Give an example where one measure of the center is preferred over another. The three different ways to measure the center of a data set include the mean, mode, and median. First, the mean is equivalent to the summation of all the values...
Investigate Distributions with Numbers
Part 1 (10 points)
1. Describe three different ways to measure the center of a data set. Give an example where one measure of the center is preferred over another.
The three different ways to measure the center of a data set include the mean, mode, and median. First, the mean is equivalent to the summation of all the values in the data set divided by the number of values in the data set. Secondly, the median happens to be the mid-range data for a set of data that has been arranged in ascending order. Lastly, mode is the data that occurs most frequently in the data set.
Median is preferred over the others because it is less impacted by skewed data and outliers.
2. Explain the quartiles of a distribution in terms of percentiles
The quartiles of a distribution include the first percentile, second quartile which is the median and the third percentile. The first quartile is equivalent to the 25th percentile, second quartile equivalent to the 50th percentile and third quartile equivalent to the 75th percentile.
3. Describe the different components of a box plot. Use the items included in the five-number summary.
The components of a box plot include the following:
1. Minimum – This is the smallest number in the data set
2. First Quartile – When the data set is arranged in ascending order from the least to the highest, and the data is split into four groups, the first quartile is the data at the lower fourth mark of the data
3. Median – When the data set is arranged in ascending order from lowest to highest, the median happens to be the data in the middle of the data set
4. Third Quartile - When the data set is arranged in ascending order from least to the highest, and the data is split into four groups, the third quartile is the data at the upper fourth mark of the data
5. Maximum – This is largest number in the data set
4. Describe the IQR rule for identifying outliers. Then, create a mock data set with at least 12 data points and with at least two outliers. Justify the outliers by applying the IQR rule.
IQR is calculated by subtracting the 1st Quartile from the 3rd Quartile
The rule for identifying outliers is as follows:
Multiply IQR by 1.5 and subtract the 1st Quartile
Multiply IQR by 1.5 and add the 3rd Quartile
Any numbers that lie outside these figures are outliers
Consider the following data set
Minimum
1st Quartile
Median
3rd Quartile
Maximum
IQR = 36.75 – 24.75 = 12
12 × 1.5 – 24.5 = -6.5
The outliers are 55 and 65
5. Write a short paragraph that defines standard deviation explains its importance. Explain the difference between population standard deviation and sample standard deviation
Standard deviation is a metric that indicates the dispersion of a data set from its mean. This measure is computed as the square root of the variance by ascertaining the variation between every data point within the data set in relation of the mean. If such data points are significantly far away from the mean, it implies that the data set has high deviation and vice versa.
6. Find the sample standard deviation of the following data sets {10, 12, 16, 20, 22}. Show all steps of the calculation.
Standard Deviation = ?? [(x - µ) / N]
Step 1: Find the mean µ
Step 2: Find the square of the distance (x - µ)2
X
(X - µ)2
Step 3: Find Standard Deviation
SD = ?? [(x - µ) / N]
7. The prices of a gallon of gasoline at 12 New York City gas stations in August 2016 were:
Based on this data set of 12 gas stations:
a) Find the mean price of gasoline.
Mean price of gas = $34.14 / 13 = $2.63
b) Find the median price of gasoline
When arranged from smallest to largest, the data set becomes as follows:
The median price is the one in the middle, which is $2.49
c) Find the range of gasoline prices
The range is the difference between the highest and the lowest value. The highest value is $3.99 whereas the lowest value is $2.15
Therefore, the range is $1.84
d) Find the five-number summary for gasoline prices.
Minimum = $2.15
First Quartile = $2.19
Median = $2.49
Third Quartile = $2.84
Maximum = $3.99
8. What is the 68-95-99.7 rule for a normal distribution?
The 68-95-99.5 rule for normal distribution states that in a normal distribution that has a mean µ and standard deviation ?,
i. Roughly 68 percent of the observations lie within ? of the mean µ
ii. Roughly 95 percent of the observations lie within 2? of the mean µ
iii. Roughly 99.7 percent of the observations lie within 3? of the mean µ (Moore, 2010).
9. Find three different items that are normally distributed. Give references used
Three things that take the normal distribution include body temperature, the diameter of a tree and the sizes of shoes (Weiers, 2010).
10. What is meant by the phrase standard normal distribution?
This is a normal distribution that has a mean of 0 and a standard deviation of 1
11. Explain what a z-score is and why it is important
A z-score can be defined as a numerical measurement of a value’s correlation to the mean in a set of values. The inference of this is that of the z-score has a measure of 0, it signifies the score as equivalent to the mean. Secondly, if the z-score happens to be positive, it means that the score is higher than the mean whereas in case the z-score happens to be negative, then it means that the score is less than the mean. The importance of the z-score is that it gives a specification of the distance from the mean and counts the number of the standard deviation between the number X and the mean µ (Gravetter and Wallnau, 2008).
12. How can one determine from a histogram if a distribution is approximately normal?
For a histogram, the data can be skewed either to the left or to the right
Part 2
1. Use Excel to obtain the following. Place your results in your Word file.
a) Find the five-number summary for the following data below. Hint, use the Excel statistic function called QUARTILE
Minimum = 3
First Quartile = 26.5
Median = 33.5
Third Quartile = 37
Maximum = 57
b) Find the IQR and use it to determine if there are any outliers.
IQR = 3rd Quartile – 1st Quartile
Determining outliers
IQR × 1.5 - 1st Quartile
IQR × 1.5 + 3rd Quartile
= 11.5 × 1.5 – 26.5 = -9.25
There is only one outlier, which is 57
2. Use Excel to determine the mean and sample standard deviation for the data given in Problem 1 of Part 2. Hint, use the Excel functions AVERAGE and STDEV.
Mean = 31
Standard Deviation = 10.66721
3. The supply manager of a university orders all supplies, including items for the athletics department. Before the football season, he must develop a separate inventory list for the football team. This list will include supplies for both the players and the department itself. Although the department budget is set in terms of inventory (based on historical data), the football team’s needs change based on the size of the team as well as its individual players. The historical data shows that the number of gallons of Gatorade consumed by a football team during a game follows a normal distribution with mean 20. The standard deviation is 3. To help with the decision of how much Gatorade to order for each game, the supply manager would like to know the following information.
The remaining sections cover Conclusions. Subscribe for $1 to unlock the full paper, plus 130,000+ paper examples and the PaperDue AI writing assistant — all included.
Always verify citation format against your institution's current style guide.