¶ … structured analysis of an experimental study by Buller et al. (2004), which describes a randomized statistical trial of two types of medication that were used to treat a blood disorder. The two types of medication used were called Fondaparinux and Enoxaparin. The blood disorder being treated was called Symptomatic Deep Venous Thrombosis. We will be analyzing the 2004 study to highlight the experiment's use of selected statistical methods that are commonly taught in introductory Statistics. These include (in sequence) measures of central tendency, measures of variation, the standard normal distribution, and a review of the study's conclusions. The term sample may be used interchangeably with the term data set, to refer to a limited number of values or measurements that have been taken from an overall population.
The primary methods of central tendency employed in statistics include three parameters known as the mean, median and mode. The mean, or average, can be computed at the level of the sample, or of the entire population. The median is computed by sorting the data set from lowest to highest and then reading the middle data point if the sample size is odd, or averaging the two middle data points if the sample size is even. The mode is simply the most frequently occurring value in the data set, and tends to be most meaningful for ordinal (or categorical) data. If a data set is not categorical, then the data may be too thinly spread to distinguish a meaningful number of points with the same value. A very important distinction in the correct use of the mean and median parameters is the existence of outliers in the data. The median is not influenced by the existence of outlier values at the extreme ends of the data set, but if such outlier values exist, they can significantly distort the mean. For this reason, if outliers are observed in the sample data, a decision may be taken to use the median instead of the mean to represent the central tendency.
In the 2004 study by Buller et al., the data were primarily categorical, so the median parameter was not of significant value to the analysis. Likewise, the mode parameter (most commonly occurring value) was not of significant interest in the results, which were interested primarily in the proportions of samples for each drug that yielded a positive correlation within their set confidence intervals. For this purpose, the mean values computed within the statistical software would have had the greatest significance in establishing and validating the result proportions of interest, and were therefore appropriate for the objectives of the study.
The primary measures of variation or dispersion used in statistics include the three parameters of range, standard deviation, and variance. The range is a simple number which is calculated as the largest value minus the smallest value in a data set. The range only indicates the maximum and minimum values in a sample, and gives no insight into the values that may fall in-between. The standard deviation and variance are related measures that indicate the spread of data, which can be computed at the level of the sample, or of the entire population. Another way to think of standard deviation is as the average distance between individual values in the data set and the mean. Standard deviation is computed by taking the square root of the variance. The variance is simply the average of the squared distances between individual data points and the mean. The reason the distances are squared in this calculation is to avoid the cancelling effect of negative distances and positive distances on either side of the mean. When using sample variances to estimate the overall variance of a population, it is very important to avoid biasing the estimation by using (n-1) for the sample size in the variance formula, instead of the actual sample size n. Without this sample size correction, the computed sample variance would become an incorrect or biased estimate of the population variance.
In the 2004 study by Buller et al., the dispersion measures of variance and standard deviation were not of primary interest to the researchers in themselves, however the confidence intervals for their calculated results were paramount. Computing valid confidence intervals (CI) relies upon firstly the establishment that data are normally distributed, and secondly having available either the mean or standard deviation value to compute the CI. Therefore, the internal computations of mean and standard deviation from the large sample size were key to the results of this study. The range parameter was of incidental interest to the researchers, and was implied by the bounds of the categorical ranges they defined for each of their various tests. As noted by the researchers, "the large sample size allowed outcome assessment in patients with a broad range of body weights and renal function." 1
A standard normal distribution is a formal construct, defined as a normal distribution having a mean of zero (0), and a standard deviation of one (1). The area under the standard normal distribution curve represents the proportion or number of observations in the sample being analyzed, and their distance relative to the mean (represented by the center line of the graph), measured using the distance of each observation from the mean, measured using the positive or negative number of standard deviations of the observation relative to the mean. If sample is observed to have a normal distribution, this means that it will have characteristics similar to a standard normal distribution, and it therefore becomes possible to use familiar tools to compute the probabilities of selected outcomes, or proportions of value ranges.
You’re 79% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.