Statistical Analysis Reported in Two Journal Articles
Research endeavors, albeit it clinical, empirical, descriptive, historical, or case study oriented, must at all times adhere to the rigors of effective or best-fit research practice. Without stringent controls placed on the area of investigation no research endeavor will advance any body of knowledge. To this end all research must be finely tuned and described as to intent or purpose, phenomenon to be assessed and reported upon, and relevance and efficacy of conclusions drawn. The remainder of this report will focus one a particular component in a research endeavor that is crucial for the acceptance of findings and conclusions drawn, namely the statistical technique employed to analyze the measured data obtained. However, prior to the actual evaluative critique pertaining to the two articles chosen I first want to present to the reader a brief scenario as to the importance of selecting the most appropriate statistical tool when analyzing measured data.
The primary purpose of the statistical process is to make order out of chaos (Ohlson, 1997). By properly applying selected statistical processes to assessment or measurement data the researcher can determine whether or not the research variables (i.e., independent and dependent) under investigation show measurement differences, effects, or relationships. However, no matter the statistical process chosen to analyze the measurement data, no process is acceptable or useful unless the researcher has developed a well-defined research question, testable null hypothesis, and valid and reliable measurement instrument. Most significantly the formulation of the null hypothesis gives direction to the research investigator as to which statistical tool is most appropriate for the type of research conclusions that are to be drawn. Henceforth, the remainder of this report will evaluate the efficacy of two research articles dealing with the effects of alcohol use on the psychosocial development of teenagers with respect to the each author(s) appropriate use of a research question, testable null hypothesis, and selected statistical tool for the purpose of assessing obtained measurement data (Kerlinger (1964).
The first article I chose was entitled "Tobacco use among high school athletes and non-athletes: Results of the 1997 Youth Risk Behavior Survey," (Melnick, et al. 2001; Adolescence). Although the authors adhered to research protocol in stating their research questions and null hypothesis they failed to identify in the hypotheses the level of statistical significance that they wanted to achieved through their statistical analysis technique. By not stating the alpha or probability level for rejecting the null hypotheses resolutely implies that the authors are willing to accept whatever happens visa via the statistical analysis. This is not acceptable in scientific research for the primary reason that a researcher can, at the end, give reason to any result they choose. In other words, when a researcher sets a confidence level for the acceptance and/or rejection of the null hypotheses he or she must be adamant as to what will be accepted and what will not. Although the authors gave a rather lengthy description of the data collection method and instrument they failed to reference the type of measurement data (ration, ordinal, nominal, interval) the instrument produced. This is important as certain statistical tool will function well with a certain type of measurement data and others will not. Further, with reference to dependent and dependent variables no reference was detected as to what variable was the independent and which one was the dependent. The closest the authors came to any variable identification was to state that the "participation variable" was the one that identified individuals who participated in sports and those who did not. Further in the study the authors referred to the "tobacco use variable" as one that would "help assess the overall relationship" between adolescence sports' participation and tobacco use. I stress very strongly that the reader keep this very important wording in mind when I discuss the particular statistical tool chosen to analyze the measurement data namely, those of relationship situations requiring particular statistical methods. Additionally, the authors, as required by research practice, fail to actually cite the statistical method or methods used in the data analysis. Such a citation is to be carried out in the introductory portion of any research writing. The listing of the statistical tools employed to analyze their numerical assessment data is brought to the reader's attention through out the manuscript, particularly in the results section "RSESULTS: Descriptive Statistics." Once such citing offered the reader information that the primary statistical tool to be used was that of logistic regression and odds ratios; neither of which are descriptive statistics. As far as I have been able to find out, as well as have learned, descriptive statistics are tool that permit the investigator to describe the collected data by way of population means, ranges, medians, correlations, central tendencies, standard deviations, variances, and ranges. All these simply provide the investigator to present a thumbnail sketch as to what the collected data looks like. Descriptive statistics, as such, are not to be used to drawn definitive conclusions about the data collected or to make inferences about future assessment occurrences. On the other hand purporting to use measures, as did the authors, such as a logistic regression and odds ratio are measures that are related to the probability of an outcome that is a part of a MANOVA situation and, as such, is best classified as a semi-parametric estimator of an occurrence - not a parametric or descriptive estimator. To have effectively used the logistic regression analysis method the authors would have been required to present the missing covariate data when only auxiliary information is available - which they did not nor was not. Further, had the authors truly wanted to use this method of data analysis I truly believe they would have had to identify and explain several estimator variables. No such mention of any variable being an estimator variable was made - again failing to conform to the rigors of effective scientific research.
Before I continue with an evaluation of the statistics used in this study relative to the data collected I want to summarize for the reader what has taken place so far.
The authors delivered an adequate statement of the research question.
The authors, although stated their null hypotheses, failed to identify within the hypotheses the level of confidence needed for rejection.
The authors did not include in the introduction segment of their paper a presentation at to the chosen statistical tool. References to the statistical tool selected were not presented to the reader until the results section.
The authors used the terms relationship and association as that which they wanted to evaluate visa via the measurement data collected. Relationship implies correlation techniques and, as appropriated identified, association refers to an odds ratio analysis. However, the authors, I feel, failed to present a rational as to statistical analysis choice. Further, the author's use of a logistic regression analysis was completely misused in the present investigation.
The use of the odds ratio statistical technique is most often reserved for situations wherein two raters are rating an occurrence not, as used by the authors, based on an event simply occurring. Not knowing whether any tobacco use trait was continuous within the measured sample the OR value is highly dependent on the threshold level of each rater's upper limit of what is a positive rating. However, not knowing whom the rater's were/are the issue is of little consequence. What is important to realize is when a statistical method is chosen to analyze numerical data the investigator is obligated to inform the reader why such a method was chosen. It is not, in my opinion, the reader's task to second-guess the research investigator.
Knowing that logistic regression and/or models are not descriptive statistical tools, rather semi-parametric tools that require assumptions about nuisance distributions, I feel that the data reported has little summation value.
The results section of the present research investigation into smoking and non-smoking associations between high school athletes and non-athletes are, I feel, total extremely misrepresentative of the data presented. What is surprising to me is that the authors chose to use a semi-parametric tool to determine the relationship between a set of variable given a certain population. Unfortunately not one logistic regression analysis coefficient was ever presented. The question I ask myself, therefore, is why use this particular statistical tool? With reference to odds ratio values the authors, once again, failed to adhere to data reporting protocol. What the authors presented were percentage values on the basis of the two-sample groups visa via the identified variables of smoking and non-smoking. Probability levels accompanied the given percentages for each. These probability levels have no real meaning for the rejection of the null hypotheses as no confidence level for rejection was ever established. Further, odds ratios are to be presented in the form of "odds" such as an occurrence happening 2 to 1, 5 to 2, and so on, not visa via an occurrence happening 29% or 43% or 87% of the time.