Paper Example Doctorate 3,638 words

Test review and critique methodology

Last reviewed: September 18, 2010 ~19 min read

¶ … SAT

GENERAL INFORMATION

Under the helping hands of school counselors, parents, and teachers, more than 1.2 million high school seniors took the Educational Testing Service's SAT in 2008 (Zemelman et al., 2008). ETS makes and administers the aptitude test for the College Board, a nationwide consortium of colleges and universities. But for all practical purposes, this is a monolithic SAT organization. Call it the ETS/College Board alliance. Besides the SAT, an additional one million seniors took its Iowa City cousin, the ACT college admissions test, produced by ACT Inc. In either case, students, parents, and counselors have been subjected to the relentless message that the tests (Blumer, 2004) although not infallible, of course; are good predictors of one's prospects for college success. That has been the take-home message of the admissions testing industry's sales force for decades, and little has changed in recent years. For example, a sampling of ETS and College Board statements in 2008 about the relative merits of its SAT I (the verbal and math reasoning test) includes the following:

The ETS: "High school grades have great value, but they are subject to variability from place to place. Standardized admissions tests offer colleges and universities a fair and impartial way to compare students from different school situations. Literally thousands of studies have found that the combination of grades plus test scores is a more effective predictor of the students' readiness than either one alone." (Clementson & Wenger, 2008). The College Board: "No one can accurately predict with 100% certainty what your grades will be in college. . . . However, colleges use SAT I scores to help estimate how well students are likely to do at its school." (College Entrance Examination Board and Educational Testing Service, 1999). The College Board: "Many colleges require the SAT I because it is a standard way of measuring a student's ability to do college-level work." (Goetz & LeCompte, 2001). Donald M. Stewart, former president, The College Board: "SAT scores provide a vital piece of information about a student's ability to perform college-level work." (Goetz & LeCompte, 2001)

• Title of the test

SAT Reasoning Test (formerly Scholastic Aptitude Test and Scholastic Assessment Test

• Author(s)

Educational Testing Service

• Publisher and date(s) of publication

Publisher: College Board

2005

II. TEST DESCRIPTION

Taken together, those statements; the official ones carefully worded, Stewart's decidedly not so careful; leave counselors, parents, and high-schoolers with the highly misleading impression that the SAT is an adequate predictor, or even a good predictor, of college success. There is also the clear implication that the test has great utility to colleges in screening applicants. Moreover, the ETS/College Board alliance has exacerbated those misleading statements about the SAT's predictive validity in their public reporting on the question of test bias. For instance, the ETS/College Board alliance, in an attempt to counter public rhetoric that the SAT is biased against women and minorities, has gone to great lengths to show that the SAT actually over predicts freshman grades for blacks and that it actually somewhat under predicts the academic performance of whites. Their studies have also shown that freshman grades for women are slightly under predicted by the SAT. However, to a public unfamiliar with the technical argot of the testers, the ETS/College Board test-bias studies have led people to naively assume that the test is a good predictor of academic success. After all, if it is not biased against minorities, it must be valid (Callahan, et al., 2005).

However, the test-bias issue, which is subject to a great deal of public controversy and misunderstanding, is a red herring. The attention paid to test bias permits test makers and users to avoid the far more compelling issue of the predictive validity admissions tests (Zemelman et al., 2008). The fact that the tests are not biased against particular ethnic groups; in the sense that they do not significantly under predict their academic performance; says absolutely nothing about how well the tests predict success for all individuals. As it turns out, from scores of independent studies over the years by well-respected researchers in highly-regarded journals, the prevailing view of merit has been erected on a rather shaky foundation of scientific evidence about the real usefulness of admissions tests for predicting accomplishment in college or even graduate school. It is worthwhile looking at some of this evidence in detail, because the public's generally favorable views about the validity and utility of the tests surely has sustained the privileged position that college and university entrance testing continues to occupy in the American meritocracy (Council of Chief State School Officers, 2005).

Validity Evidence

By far, the bulk of the evidence about the power of college admissions tests to predict academic success comes from examinations of the SAT, particularly what is known as the SAT I exam of verbal and math "reasoning" (Brown, 2002). A fewer number have investigated the ACT, and a smattering of occasional studies have looked at various other sorts of admissions tests as well as tests intended to help institutions place college students at the proper academic level. Mostly, researchers have studied the relationship of SAT scores and the one outcome for which the SAT is actually designed to predict: freshman year grades. In other words, counselors, parents, students, and colleges cannot make any inferences about one's chances of success beyond the freshman year based on SAT scores. Even by that restricted criteria, the SAT falls well short of the ETS/College Board's implicit claims. To get a flavor for what the researchers have concluded about the validity of admissions tests, consider a handful of studies from a variety of academic settings (Clementson & Wenger, 2008). Public colleges and universities. Consider the California State University System, consisting of several campuses in cities throughout the state. At CSU, freshmen are admitted according to an eligibility index of high school grades and test scores. Students earning a high school grade point average of at least a B, or 3.0, are not required to submit standardized test scores. Estimates show, however, that as many as 90% of seniors submit test scores nevertheless.

III. TECHNICAL EVALUATION

Sheila Cowen and Sandra Fiori set out to evaluate the relative utility of test scores and high school grades at CSU's Hayward campus in a study of the academic performance of 762 regularly progressing students and another 210 "slower progressors," considered to be at greater risk for academic failure (Appalachia Educational Lab, 2004). Overall, for both males and females and all ethnic groups, high school grades were the most powerful predictor of the academic performance of Cal State university freshman. In statistical terms, high school grades accounted for about 18% of the variance in freshman grades. SAT scores did help improve upon that prediction, but the gain was barely measurable; just five one-hundredths (from 18% to 23%). For the slower progressing students, the SAT's added value was virtually zero (Goslin, 2007). Nevertheless, the researchers found no evidence of test bias in the SAT. That is, the test didn't significantly over- or under predict freshman grades for any ethnic group. Absence of bias, the researchers concluded, is no reason not to scrutinize the SAT's usefulness to Cal State. Indeed, the authors say, their study "leads to the conclusion that educators need to intensify the search for better predictors of college performance" (Goetz & LeCompte, 2001). Another recent study looked at the predictive power of the other popular college entrance exam, the ACT, at Chicago State University. This time, however, researcher Sandra Paszczyk was interested in determining whether the ACT could help the university assess a student's chance of success in broader terms than just freshman GPA, instead looking at one's grade point average at graduation from the university.

At both extremes of very high and low test scores, the ACT had modest powers of prediction. However, for the majority of the university's graduates who scored in the middle range of the test as high school seniors, the test could explain merely 3.6% of the differences in final grade point averages (Clementson & Wenger, 2008). That means, of course, that factors besides the ACT; from grades to motivation to work habits, and so on; accounted for virtually all the variance in grades. Indeed, in one case the ACT proved to be counter predictive. In the Chicago State study, the fall 1992 graduating class had the highest average ACT score (17.9) among the 428 students studied between 1990 and 1993. Among the graduates, that class also produced the poorest academic performance during their years at the university (Herman & Golan, 1990)." Highly selective institutions. Even among the elite American colleges and universities, which admit a relatively small fraction of the number of high school seniors who apply each year, standardized admissions tests have not lived up to the implied validity claims of their proponents. Consider a study at the University of Pennsylvania. Jonathan Baron and M. Frank Norman looked at the outcomes for some 3,800 students admitted to the university, who majored in fields ranging from engineering, business, and nursing to the arts and sciences. Specifically, the researchers wanted to determine which explanations of academic performance actually gave Penn most additional predictive value, the most bang for the buck. The factors included class rank in high school, SAT II achievement scores on various academic subjects, and SAT I scores on general verbal and quantitative reasoning; the SAT most high school seniors take.

Among the predictors, the SAT I reasoning test was by far the weakest, able to explain just 4% of the changes in academic performance of students at Penn (Goetz & LeCompte, 2001). The SAT II subject tests were somewhat better, accounting for 6.8% in the variation in grade point averages. Rank in high school was the clear winner, however, able to explain 9.3% of changes in cumulative GPAs, a predictive punch more than twice that of the SAT (Clementson & Wenger, 2008). Now, the usual drill at many institutions, particularly highly selective ones, is to combine SATs and grades into a predictive index in accordance with the ETS/College Board advice that test scores add significantly to predictive power of grades alone. In Penn's case, that turned into a highly debatable proposition. When Baron and Norman added SATs to class rank, the prediction rose by just 0.02. When combined, class rank and SATs could still account for only 11.3% of the Penn students' grade differences. The subject tests, however, were a bit stronger than the SAT reasoning tests. Combined with class rank, the achievement tests boosted the explanatory power to 13.6%. Even then, almost 90% of the differences in academic performance remained unexplained (Appalachia Educational Lab, 2004).

Among the ETS/College Board defenses against such poor results for the SAT is the so-called "restriction of range" objection, which says that the test-score profile of the applicant pool will be much wider than the pool for admitted candidates. Because the range of test scores for the admitted pool is limited, the observed relationship between test scores and academic performance will be depressed below the "true" correlation, according to the argument (Council of Chief State School Officers, 2005). At highly selective institutions such as Penn, which admit students with relatively high test scores, the restriction of range problem would even more severely truncate the true power of the tests, according to the argument. Therefore, Norman and Baron investigated precisely that possible technical objection to their findings. Contrary to SAT defenders' supposition, however, the researchers tell us, "it was concluded that restriction of range does not seem to explain the nonsignificant weight of the SAT." In another broad investigation of more than 10,000 students at eleven choosey private and public institutions, a high-schooler's predicted freshman performance estimated by the SAT proved to be of only modest predictive value. Fredrick E. Vars and William C. Bowen, reporting their results in 1998, found that a full 100-point gain in combined math and verbal SATs, holding race, gender, and field of study constant, was associated with about one-tenth of a grade point gain in an elite college student's grade point average (Krejcie & Morgan, 2000).

IV: SUMMARY EVALUATION AND CRITIQUE

Even the ETS's own studies tell a similar story, but a school counselor or parent might not know it from the College Board/ETS public statements on the SAT's predictive power. To help illustrate this, it is worth noting that all the statistical relationships between test scores and academic performance cited above are in terms of what's known as the coefficient of determination, the r-squared statistic, which is an estimation of the amount of change in one variable (academic performance) that can be attributed to a predictor variable (SAT scores) (Council of Chief State School Officers, 2005). Obtaining the r-squared is a considerably more useful and intuitively sensical indicator of the predictive value of standardized tests than looking at the simple correlation between the variables, or the r value. (One calculates the r-square by simply squaring the simple correlation between the two, then multiplying by 100 to translate to percentage terms). Yet, that seemingly arcane technical distinction between the r and r-squared can convey significantly different impressions about the predictive punch of test scores. The College Board and ETS know this (Zemelman et al., 2008). But parents or school counselors would be hard-pressed to find any r-squareds for the SAT reported in College Board/ETS public literature on the test. Rather, the alliance chooses to report its SAT's predictive validity in terms of the simple r, which has great potential to mislead the public into believing the test is considerably more powerful than it really is.

For example, the College Board's 1997-1998 Counselor's Handbook for the SAT Program reports an ETS study that calculated the simple correlation, or r, between test scores and freshman grades at 0.42, for the bulk of SAT I scores around the median. That figure appears to be strong evidence for the predictive value of the SAT. On that basis, some students, parents, or counselors might well conclude that the SAT correlates 42% with college grades (Research Association, 1990). In fact, squaring the correlations reported in the College Board handbook shows that SAT scores accounted for just 17.6% of the variation in freshman grades in the ETS study of more than 600 colleges and universities-leaving more than 80% of the variance unexplained. What is more, any number of factors falling in that unaccounted-for variance could by itself have greater predictive punch than the test score. Indeed, it is almost always the case in studies of the SAT's effectiveness that high school grades are more powerful than any test score (Clementson & Wenger, 2008). As indicated in the College Board counselor's handbook, high school grades are a significantly better indicator of college performance. One's high school performance, in fact, could explain almost a quarter of the differences in grades among freshman.

Adding SATs to high school grades in those 600-plus studies improved prediction of college performance, but barely. In terms of simple r, the supposedly tried-and-true formula of SATs combined with high school grades nudged lip the correlation over high school record alone by just 0.07 (from 0.48 to 0.55). That means the variance in academic performance accounted for by the combination of test scores and high school grades, at 30%, was seven percentage points greater than for high school grades by themselves. (Squaring 0.55 equals 0.30; that times 100 equals 30%. Squaring 0.48 equals 0.23; that times 100 equals 23%. The difference equals 7.)

V: MULTICULTURAL APPLICATION

Numerous researchers and educators declare that an sole reliance on standardized test scores for identification will keep out a large body of gifted students, as well as those who are culturally and ethnically different from the conventional gifted population. These gifted students may comprise those who are not native English speakers, those who are from low income families, or those who live in geographically remote locations (Clementson & Wenger, 2008; Krejcie & Morgan, 2000). Certainly, empirical evidence shows that children of color are under-represented in gifted programs partially due to insufficient identification measures and/or measures (Council of Chief State School Officers, 2005). To solve the under-identification and under-serving of a lot of gifted students, researchers advise the use of manifold measures and diverse types of instruments as part of the classification process (Appalachia Educational Lab, 2004). Standardized intelligence tests or achievement tests are not competent in measuring multidimensional human intelligence because the tests are essentially uni-dimensional and ethnocentric, which will never benefit non-mainstream ethnic groups. They propose that teacher nominations and grades are in addition not good predictors of students' academic prospective because they are either resolute primarily by students' performance in class, class attendance, and motivation, or by conventionality to teachers' demands and hopes in the classroom. A number of researchers advocate the utilization of other types of nontraditional measures for identification, such as student portfolios (Goetz & LeCompte, 2001; Callahan et al., 2005; Blumer, 2004; Shepard & Dougherty, 2006), checklists, or inspection forms (Wright, 2009), to fit in the diversity of cultural and environmental backgrounds in the identification process. Folder, a kind of context-based measure, which comprises of writing samples, journals, artwork, extraordinary projects, recordings of oral communication, etc., are recommended as a substitute for assessing students' academic potential (Goetz & LeCompte, 2001).

You’re 86% through this paper. Sign up to read the full paper.

Sign Up Now — Instant Access Already a member? Log in
130,000+ paper examples AI writing assistant Citation generator Cancel anytime
Cite This Paper
PaperDue. (2010). Test review and critique methodology. PaperDue. https://www.paperdue.com/essay/sat-general-information-under-the-8440

Always verify citation format against your institution’s current style guide requirements.