Research Paper Doctorate 3,282 words

Statistical Analysis Reported in Two Journal Articles

Last reviewed: July 31, 2004 ~17 min read

¶ … Statistical Analysis Reported in Two Journal Articles

Research endeavors, albeit it clinical, empirical, descriptive, historical, or case study oriented, must at all times adhere to the rigors of effective or best-fit research practice. Without stringent controls placed on the area of investigation no research endeavor will advance any body of knowledge. To this end all research must be finely tuned and described as to intent or purpose, phenomenon to be assessed and reported upon, and relevance and efficacy of conclusions drawn. The remainder of this report will focus one a particular component in a research endeavor that is crucial for the acceptance of findings and conclusions drawn, namely the statistical technique employed to analyze the measured data obtained. However, prior to the actual evaluative critique pertaining to the two articles chosen I first want to present to the reader a brief scenario as to the importance of selecting the most appropriate statistical tool when analyzing measured data.

The primary purpose of the statistical process is to make order out of chaos (Ohlson, 1997). By properly applying selected statistical processes to assessment or measurement data the researcher can determine whether or not the research variables (i.e., independent and dependent) under investigation show measurement differences, effects, or relationships. However, no matter the statistical process chosen to analyze the measurement data, no process is acceptable or useful unless the researcher has developed a well-defined research question, testable null hypothesis, and valid and reliable measurement instrument. Most significantly the formulation of the null hypothesis gives direction to the research investigator as to which statistical tool is most appropriate for the type of research conclusions that are to be drawn. Henceforth, the remainder of this report will evaluate the efficacy of two research articles dealing with the effects of alcohol use on the psychosocial development of teenagers with respect to the each author(s) appropriate use of a research question, testable null hypothesis, and selected statistical tool for the purpose of assessing obtained measurement data (Kerlinger (1964).

The first article I chose was entitled "Tobacco use among high school athletes and non-athletes: Results of the 1997 Youth Risk Behavior Survey," (Melnick, et al. 2001; Adolescence). Although the authors adhered to research protocol in stating their research questions and null hypothesis they failed to identify in the hypotheses the level of statistical significance that they wanted to achieved through their statistical analysis technique. By not stating the alpha or probability level for rejecting the null hypotheses resolutely implies that the authors are willing to accept whatever happens visa via the statistical analysis. This is not acceptable in scientific research for the primary reason that a researcher can, at the end, give reason to any result they choose. In other words, when a researcher sets a confidence level for the acceptance and/or rejection of the null hypotheses he or she must be adamant as to what will be accepted and what will not. Although the authors gave a rather lengthy description of the data collection method and instrument they failed to reference the type of measurement data (ration, ordinal, nominal, interval) the instrument produced. This is important as certain statistical tool will function well with a certain type of measurement data and others will not. Further, with reference to dependent and dependent variables no reference was detected as to what variable was the independent and which one was the dependent. The closest the authors came to any variable identification was to state that the "participation variable" was the one that identified individuals who participated in sports and those who did not. Further in the study the authors referred to the "tobacco use variable" as one that would "help assess the overall relationship" between adolescence sports' participation and tobacco use. I stress very strongly that the reader keep this very important wording in mind when I discuss the particular statistical tool chosen to analyze the measurement data namely, those of relationship situations requiring particular statistical methods. Additionally, the authors, as required by research practice, fail to actually cite the statistical method or methods used in the data analysis. Such a citation is to be carried out in the introductory portion of any research writing. The listing of the statistical tools employed to analyze their numerical assessment data is brought to the reader's attention through out the manuscript, particularly in the results section "RSESULTS: Descriptive Statistics." Once such citing offered the reader information that the primary statistical tool to be used was that of logistic regression and odds ratios; neither of which are descriptive statistics. As far as I have been able to find out, as well as have learned, descriptive statistics are tool that permit the investigator to describe the collected data by way of population means, ranges, medians, correlations, central tendencies, standard deviations, variances, and ranges. All these simply provide the investigator to present a thumbnail sketch as to what the collected data looks like. Descriptive statistics, as such, are not to be used to drawn definitive conclusions about the data collected or to make inferences about future assessment occurrences. On the other hand purporting to use measures, as did the authors, such as a logistic regression and odds ratio are measures that are related to the probability of an outcome that is a part of a MANOVA situation and, as such, is best classified as a semi-parametric estimator of an occurrence - not a parametric or descriptive estimator. To have effectively used the logistic regression analysis method the authors would have been required to present the missing covariate data when only auxiliary information is available - which they did not nor was not. Further, had the authors truly wanted to use this method of data analysis I truly believe they would have had to identify and explain several estimator variables. No such mention of any variable being an estimator variable was made - again failing to conform to the rigors of effective scientific research.

Before I continue with an evaluation of the statistics used in this study relative to the data collected I want to summarize for the reader what has taken place so far.

The authors delivered an adequate statement of the research question.

The authors, although stated their null hypotheses, failed to identify within the hypotheses the level of confidence needed for rejection.

The authors did not include in the introduction segment of their paper a presentation at to the chosen statistical tool. References to the statistical tool selected were not presented to the reader until the results section.

The authors used the terms relationship and association as that which they wanted to evaluate visa via the measurement data collected. Relationship implies correlation techniques and, as appropriated identified, association refers to an odds ratio analysis. However, the authors, I feel, failed to present a rational as to statistical analysis choice. Further, the author's use of a logistic regression analysis was completely misused in the present investigation.

The use of the odds ratio statistical technique is most often reserved for situations wherein two raters are rating an occurrence not, as used by the authors, based on an event simply occurring. Not knowing whether any tobacco use trait was continuous within the measured sample the OR value is highly dependent on the threshold level of each rater's upper limit of what is a positive rating. However, not knowing whom the rater's were/are the issue is of little consequence. What is important to realize is when a statistical method is chosen to analyze numerical data the investigator is obligated to inform the reader why such a method was chosen. It is not, in my opinion, the reader's task to second-guess the research investigator.

Knowing that logistic regression and/or models are not descriptive statistical tools, rather semi-parametric tools that require assumptions about nuisance distributions, I feel that the data reported has little summation value.

The results section of the present research investigation into smoking and non-smoking associations between high school athletes and non-athletes are, I feel, total extremely misrepresentative of the data presented. What is surprising to me is that the authors chose to use a semi-parametric tool to determine the relationship between a set of variable given a certain population. Unfortunately not one logistic regression analysis coefficient was ever presented. The question I ask myself, therefore, is why use this particular statistical tool? With reference to odds ratio values the authors, once again, failed to adhere to data reporting protocol. What the authors presented were percentage values on the basis of the two-sample groups visa via the identified variables of smoking and non-smoking. Probability levels accompanied the given percentages for each. These probability levels have no real meaning for the rejection of the null hypotheses as no confidence level for rejection was ever established. Further, odds ratios are to be presented in the form of "odds" such as an occurrence happening 2 to 1, 5 to 2, and so on, not visa via an occurrence happening 29% or 43% or 87% of the time.

Near the end of the results section I encountered a most distressing research error, namely, a statement that the study also investigated the "differences" in smoking and non-smoking for athletes and non-athletes. First, no such mention was ever made in the beginning of the study with respect to gender differences. Second, logistic regression analysis and/or techniques have no earthly association with differences. Had the authors wanted to determine whether or not differences occurred they should have employed the proper descriptive tool "t" test or ANOVA." Again, this was not the case. Additionally the authors made the statement in their concluding remarks that adolescent smoking is related to social insecurity and social isolation. I found that this particular condition and/or variable were never even mentioned in the introductory stage to the research article nor was it included in the research question section. Therefore, any information and/or conclusion with reference to this condition is totally out of place and should not be tied to the results of the investigation.

Although I feel the topic of smoking amongst adolescents is an extremely important area of research, when conducted it has to be done so with research acumen and style. To correct the research that was conducted by Melnick, et al. would have to begin at the very beginning. The hypotheses would have to be stated properly, all independent and dependent variables identified and defined, and, most importantly, the best statistical tool selected to analyze the collected numerical data. Some suggestion with respect to the most appropriate statistical tool would include, I believe, using correlation or chi square methods to establish relationships, "t" test to establish differences, and an ANOVA (orthogonal) to test for effects and differences. Although the odds ratio is acceptable in this particular situation I feel that presenting the data by way of percentages and not by odds is a misuse to the tool. Lastly, stating that a particular statistical tool will be used (logistic regression analysis) and not presenting any resulting coefficients is simply not acceptable. Also, as a sidebar note when dealing with a sample as large as did thee authors (over 16,000) they might have been better off using a non-parametric statistical tool that can better deal with large samples.

The second article I chose for critical analysis is entitled "Violence in the lives of pregnant teenage women: Associations with multiple substance abuse" and authored by Martin, et al. (1999) and published in the American Journal of Drug and Alcohol Abuse. The specific research questions posed by the authors included issues pertaining to the proportion of pregnant women who are victims of violence; the proportion of pregnant women who use cigarettes, alcohol, and illicit drugs; and the types of violence associated with particular substances with respect to pregnant women. The sample was not random, as all subjects comprised the entire multidisciplinary population identified by a state department of health. I feel this is important to remember with respect to the statistical tool chosen to analyze the data as well as with reference to the conclusions drawn. I should note at this time that no testable null hypotheses were given at any point through out the article. The authors stated that they chose descriptive statistics to analyze their numerical data but failed to identify what specific measures were used, even though the authors reported the use of statistics in the right place; i.e., introduction. Whenever mention is made about the use of a particular category of statistics the researcher is, I feel, obligated to identify what particular measure or measures within the general category are being used. This was not the case in this particular research endeavor.

In addition to stating that descriptive statistics would be used to describe types of substances, patterns of use, and sociodemographic characteristics the authors also reported that they would use the odds ratio statistic as well - the same as did the authors of the first article. It was not until the results section that I was made aware that the descriptive statistical tool being used was that of simple percentages - again the same occurred in article one. Although I do not feel it is necessarily taboo to report percentages, they do not, by themselves, guarantee meaningfulness. Rather than merely reporting simple percentages the authors would have been better off offering the reader a graph that would have more artistically conveyed their message. After two and one-half pages of percentage presentations the authors move toward presenting OR coefficients with respect the variables under investigation. Similar to the first article reviewed I found that because there were no testable null hypotheses formulated and stated, thus presenting no confidence acceptance level, the presentation of OR coefficients with resulting alpha levels has no significant meaning. One must remember at all times that setting a probability or confidence level gives the research a benchmark for acceptance or rejection of the stated null hypotheses. Without these two very necessary research components the reader has the full right to as "Just how right is right in your study? Do you want to accept your conclusions as 99.99% right, 95% right, or 80% right?" I firmly feel it is not the responsibility of the reader to establish the rejection or acceptance level; regardless whether of not the authors identify which interaction were significant at the.001,.01, or.05 levels with little single or double asterisk marks.

Unlike the researchers who conducted the study that I critiqued in the first article the authors of article two presented their OR values as actual coefficients - the proper way to report OR values. However what the authors failed to do was relate, visa via discussion, what the OR coefficients meant with respect to the investigation particularly with respect to the Chi square statistical tool that I found the authors used. Unfortunately the authors never introduced this descriptive statistic at the beginning of the reported study.

Although not mentioned in the study the use of the Chi square statistical tool complements the OR tool and visa versa, it is mandated by proper research protocol to state why the two methods are being employed. For example, if a study reports a highly significant chi square value, say at the.0l level of confidence with respect to the variables being studied, then it falls to reason that the "odds" of differences occurring would be high as well. When there is not a proportionate relationship between the two values all that the researchers can conclude is that something went wrong in the data analysis. In other words I feel one tool acts as a statistical support system for the other.

Other errors, unfortunately, occurred, I feel, in this investigative situation with respect to the use of the Chi Square statistical tool. Normally the Chi square method is used for comparing observed counts with expected counts. In an attempt to analyze data with the Chi square method researchers are obligated to identify the type of Chi square that is being used to analyze the data, albeit the Pearson's Chi square (with or without the Yate's correction), the Chi square Goodness of Fit test, Likelihood Ratio Chi Square, Linear by Linear Association Chi Square, or the Mantel-Haenszel Chi Square. At no time through out reviewing this particular article did the authors identify the type of Chi square test being used. As such, interpreting the data can take on varied assumptions with respect to statistical significance. Other significant areas I found missing with respect to the use of the Chi Square technique include the following:

You’re 84% through this paper. Sign up to read the full paper.

Sign Up Now — Instant Access Already a member? Log in
130,000+ paper examples AI writing assistant Citation generator Cancel anytime
Cite This Paper
PaperDue. (2004). Statistical Analysis Reported in Two Journal Articles. PaperDue. https://www.paperdue.com/essay/statistical-analysis-reported-in-two-journal-175595

Always verify citation format against your institution’s current style guide requirements.