- Length: 4 pages
- Sources: 3
- Subject: Education - Mathematics
- Type: Essay
- Paper: #50106093
- Related Topic: Descriptive, Statistics, Multivariate Analysis

What I Learned About Statistics

The most important thing that I have learned about statistics is that there is no reason to be afraid. Prior to studying statistics and statistical methods many students view statistics as being extremely difficult, dense, and nearly impossible to understand. After learning about the various types of statistics, analyses, hypothesis testing, and so forth it becomes quite clear that statistics is a logical discipline that begins with basic assumptions and building blocks and then builds upon them more advanced and practical methods of understanding the world.

At the basic level descriptive statistics serve as the foundation for the entire field (Black, 2011). Descriptive statistics summarize data, data being observations of the world that are given quantitative values (Tanner & Youssef -- Morgan, 2013). The most commonly used sets of descriptive statistics are measures of central tendency and measures of dispersion (Tanner & Youssef -- Morgan, 2013).

Measures of central tendency describe how sets of data have a propensity to "clump" or gather towards the middle of a distribution of observations or scores. The three major measures of central tendency are (Runyon, Coleman, & Pittenger, 2000): the mode, which is the most commonly occurring score in a distribution of scores or observations; the median, which is the single score that cuts the distribution in half (50% of the scores are above and 50% of the scores are below the median, and; the mean, which is the arithmetic average of the scores in a distribution (Black, 2011).

Each of these measures of central tendency is useful depending on the type of data involved; however, the mean is by far the most relevant in its application. The median is more appropriate when data is ordinal and the mode is more appropriate when data is discrete (Tanner & Youssef -- Morgan, 2013). Nonetheless all these measures of central tendency are useful and in the special case of the normal distribution all of these measures of central tendency are always the same value (Tanner & Youssef -- Morgan, 2013).

The other important descriptive statistic involves understanding how the observations in the distribution or data set are spread around a measure of central tendency and are spread throughout the distribution. Here there are several potential measures such as the range and its variations (interquartile range and so forth), the variance, and the standard deviation. The range is typically calculated by subtracting the smallest observation from the largest observation in a distribution; it is the distance between the smallest and largest scores in the distribution (or between some other designated cut points; Tanner & Youssef -- Morgan, 2013). The variance is the average of the squared differences between the mean and all of these scores in a particular distribution, whereas the standard deviation is a square root of the variance (Tanner & Youssef -- Morgan, 2013). The standard deviation is the most widely reported measure of dispersion and helps to visualize the shape of a distribution with larger standard deviations being more spread out and smaller standard deviations representing distributions that are more tightly packed around the mean (Runyon, Coleman, & Pittenger, 2000).

All of these descriptive statistics are very important and represent the first step in understanding the use of statistics. Descriptive statistics allow for the understanding of how data or observations are shaped and summarize their general characteristics; however, one cannot make inferences regarding the comparison of the different distributions of scores or variables using descriptive statistics alone (Tanner & Youssef -- Morgan, 2013). In order to understand how different distributions of scores relate to each other and compare the researcher must use inferential statistics which allow for a further comparison of different distributions of scores (Tabachnick & Fidell, 2012).

There are several different categories of inferential statistics including bivariate and multivariate inferential statistics. Bivatiate statistics are statistics that look at the relationships between two different variables (typically an independent and dependent variable) and include such things as correlation coefficients to test for linear associations between variables, t-tests to test for the differences between two groups on an independent variable, and one -- way ANOVA to test for differences on one independent variable with more than two groups (Runyon, Coleman, & Pittenger, 2000).

Multivariate statistics include such tests as multiple regression and factorial ANOVA that allow researchers to examine relationships between more than two different independent variables, dependent variables, or more than two of both. These analyses can be very complex and often represent real-world conditions much more accurately than bivariate inferential statistics (Tabachnick & Fidell, 2012).

The process of using inferential statistics is dependent on the notion of hypothesis testing. Typically a researcher develops a hypothesis based on their observation of real-world conditions. A hypothesis describes the relationship between two or more variables and the type of inferential test the researcher uses depends on the number of variables and the number of subjects being used. The hypothesis that there is no difference between the observations or distributions of variables is known as the null hypothesis and the hypothesis that the researcher typically is testing to confirm is known as the alternative hypothesis (Runyon, Coleman, & Pittenger, 2000; Tanner & Youssef -- Morgan, 2013). Hypothesis testing can be nondirectional when the researcher is not sure of whether the alternative hypothesis can be specified as one set of means being greater or less than another or it can directional and the researcher can specify that one particular set of observations will have a greater (or lesser) value than the other. During hypothesis testing the researcher attempts to find support for the alternative hypothesis and reject the null hypothesis as being representative of the data (Black, 2011). Hypothesis testing requires the calculation of a measure of central tendency, measure of dispersion, and the use of some inferential statistic. Thus, the field of statistics proceeds like building a house: one must first develop the foundation (descriptive statistics) before one can build the rest of the structure (inferential statistics; Tanner & Youssef -- Morgan, 2013).

The selection of an appropriate statistical test is dependent on a number of issues. The first issue to be considered is the type of data being collected. If the data is nominal (does not specify quantitative differences but just identifies observations) one must use different inferential methods compared to data that is interval (specifies quantitative differences; Tanner & Youssef -- Morgan, 2013). The second consideration is the shape of the distribution. Distributions that are highly skewed required nonparametric type tests, whereas distributions that are not skewed in approach normal can use parametric inferential statistics (Runyon, Coleman, & Pittenger, 2000). The researcher also chooses the type of statistic to use based on the question being asked. If one wants to know how to distributions are associated with one another one would use some type of a correlation coefficient, whereas if one wanted to know if a particular teaching style for managers is more effective than an existing style one would have to use of inferential test designed to compare mean differences. Finally, the number of subjects, the number of independent variables, and number of dependent variables dictate the types of inferential statistics a researcher has at their disposal (Tabachnick & Fidell, 2012).

Overall the evaluation of statistical findings depends on several different things. First, the methodology used to collect the data is important in determining how well one can generalize the findings of a specific sample to a population. Typically one would want to use one of the many random sampling techniques if possible in order to be able to generalize the findings of a particular study to a larger population; however, this often is not practical and one has to use convenience samples. This limits the ability to generalize the findings of a particular research project beyond the participants in the project. Secondly, if one wishes to infer some type of cause and effect relationship one needs to use a type of experimental design word the participants are randomly assigned to the different conditions in the experiment. Without this random assignment to conditions one cannot make causal inferences regarding the findings. The particular type of statistical analysis used dictates the type of generalizations one can make from the findings. For instance, a researcher using a correlation coefficient would not be able to make the same type of conclusions that one using a one -- way ANOVA could make and vice versa. Moreover, one needs to consider the distributions of the variables themselves when summarizing inferential statistical results. Measurements with high levels of error or large standard deviations and that may produce spurious findings and researchers need to be able to identify these conditions and understand them. One can only summarize statistical findings based on the type of data, variables, collection method, group assignment method, and statistical test one uses (Tabachnick & Fidell, 2012; Tanner & Youssef -- Morgan, 2013).

I have learned that the field of statistics is like any other discipline. One must first understand…