Specifically, the researchers wanted to determine which explanations of academic performance actually gave Penn most additional predictive value, the most bang for the buck. The factors included class rank in high school, SAT II achievement scores on various academic subjects, and SAT I scores on general verbal and quantitative reasoning; the SAT most high school seniors take.
Among the predictors, the SAT I reasoning test was by far the weakest, able to explain just 4% of the changes in academic performance of students at Penn (Goetz & LeCompte, 2001). The SAT II subject tests were somewhat better, accounting for 6.8% in the variation in grade point averages. Rank in high school was the clear winner, however, able to explain 9.3% of changes in cumulative GPAs, a predictive punch more than twice that of the SAT (Clementson & Wenger, 2008). Now, the usual drill at many institutions, particularly highly selective ones, is to combine SATs and grades into a predictive index in accordance with the ETS/College Board advice that test scores add significantly to predictive power of grades alone. In Penn's case, that turned into a highly debatable proposition. When Baron and Norman added SATs to class rank, the prediction rose by just 0.02. When combined, class rank and SATs could still account for only 11.3% of the Penn students' grade differences. The subject tests, however, were a bit stronger than the SAT reasoning tests. Combined with class rank, the achievement tests boosted the explanatory power to 13.6%. Even then, almost 90% of the differences in academic performance remained unexplained (Appalachia Educational Lab, 2004).
Among the ETS/College Board defenses against such poor results for the SAT is the so-called "restriction of range" objection, which says that the test-score profile of the applicant pool will be much wider than the pool for admitted candidates. Because the range of test scores for the admitted pool is limited, the observed relationship between test scores and academic performance will be depressed below the "true" correlation, according to the argument (Council of Chief State School Officers, 2005). At highly selective institutions such as Penn, which admit students with relatively high test scores, the restriction of range problem would even more severely truncate the true power of the tests, according to the argument. Therefore, Norman and Baron investigated precisely that possible technical objection to their findings. Contrary to SAT defenders' supposition, however, the researchers tell us, "it was concluded that restriction of range does not seem to explain the nonsignificant weight of the SAT." In another broad investigation of more than 10,000 students at eleven choosey private and public institutions, a high-schooler's predicted freshman performance estimated by the SAT proved to be of only modest predictive value. Fredrick E. Vars and William C. Bowen, reporting their results in 1998, found that a full 100-point gain in combined math and verbal SATs, holding race, gender, and field of study constant, was associated with about one-tenth of a grade point gain in an elite college student's grade point average (Krejcie & Morgan, 2000).
IV: SUMMARY EVALUATION AND CRITIQUE
Even the ETS's own studies tell a similar story, but a school counselor or parent might not know it from the College Board/ETS public statements on the SAT's predictive power. To help illustrate this, it is worth noting that all the statistical relationships between test scores and academic performance cited above are in terms of what's known as the coefficient of determination, the r-squared statistic, which is an estimation of the amount of change in one variable (academic performance) that can be attributed to a predictor variable (SAT scores) (Council of Chief State School Officers, 2005). Obtaining the r-squared is a considerably more useful and intuitively sensical indicator of the predictive value of standardized tests than looking at the simple correlation between the variables, or the r value. (One calculates the r-square by simply squaring the simple correlation between the two, then multiplying by 100 to translate to percentage terms). Yet, that seemingly arcane technical distinction between the r and r-squared can convey significantly different impressions about the predictive punch of test scores. The College Board and ETS know this (Zemelman et al., 2008). But parents or school counselors would be hard-pressed to find any r-squareds for the SAT reported in College Board/ETS public literature on the test. Rather, the alliance chooses to report its SAT's predictive validity in terms of the simple r, which has great potential to mislead the public into believing the test is considerably more powerful than it really is.
For example, the College Board's 1997-1998 Counselor's Handbook for the SAT Program reports an ETS study that calculated the simple correlation, or r, between test scores and freshman grades at 0.42, for the bulk of SAT I scores around the median. That figure appears to be strong evidence for the predictive value of the SAT. On that basis, some students, parents, or counselors might well conclude that the SAT correlates 42% with college grades (Research Association, 1990). In fact, squaring the correlations reported in the College Board handbook shows that SAT scores accounted for just 17.6% of the variation in freshman grades in the ETS study of more than 600 colleges and universities-leaving more than 80% of the variance unexplained. What is more, any number of factors falling in that unaccounted-for variance could by itself have greater predictive punch than the test score. Indeed, it is almost always the case in studies of the SAT's effectiveness that high school grades are more powerful than any test score (Clementson & Wenger, 2008). As indicated in the College Board counselor's handbook, high school grades are a significantly better indicator of college performance. One's high school performance, in fact, could explain almost a quarter of the differences in grades among freshman.
Adding SATs to high school grades in those 600-plus studies improved prediction of college performance, but barely. In terms of simple r, the supposedly tried-and-true formula of SATs combined with high school grades nudged lip the correlation over high school record alone by just 0.07 (from 0.48 to 0.55). That means the variance in academic performance accounted for by the combination of test scores and high school grades, at 30%, was seven percentage points greater than for high school grades by themselves. (Squaring 0.55 equals 0.30; that times 100 equals 30%. Squaring 0.48 equals 0.23; that times 100 equals 23%. The difference equals 7.)
V: MULTICULTURAL APPLICATION
Numerous researchers and educators declare that an sole reliance on standardized test scores for identification will keep out a large body of gifted students, as well as those who are culturally and ethnically different from the conventional gifted population. These gifted students may comprise those who are not native English speakers, those who are from low income families, or those who live in geographically remote locations (Clementson & Wenger, 2008; Krejcie & Morgan, 2000). Certainly, empirical evidence shows that children of color are under-represented in gifted programs partially due to insufficient identification measures and/or measures (Council of Chief State School Officers, 2005). To solve the under-identification and under-serving of a lot of gifted students, researchers advise the use of manifold measures and diverse types of instruments as part of the classification process (Appalachia Educational Lab, 2004). Standardized intelligence tests or achievement tests are not competent in measuring multidimensional human intelligence because the tests are essentially uni-dimensional and ethnocentric, which will never benefit non-mainstream ethnic groups. They propose that teacher nominations and grades are in addition not good predictors of students' academic prospective because they are either resolute primarily by students' performance in class, class attendance, and motivation, or by conventionality to teachers' demands and hopes in the classroom. A number of researchers advocate the utilization of other types of nontraditional measures for identification, such as student portfolios (Goetz & LeCompte, 2001; Callahan et al., 2005; Blumer, 2004; Shepard & Dougherty, 2006), checklists, or inspection forms (Wright, 2009), to fit in the diversity of cultural and environmental backgrounds in the identification process. Folder, a kind of context-based measure, which comprises of writing samples, journals, artwork, extraordinary projects, recordings of oral communication, etc., are recommended as a substitute for assessing students' academic potential (Goetz & LeCompte, 2001).
Proponents of standardized entrance tests have argued that even such marginal improvements on the predictive value of the high school record is better than nothing, and hence, the SAT and its ilk remain beneficial to colleges and universities. However, that claim has been shattered by the work of James Crouse and Dale Trusheim (Zemelman et al., 2008), who have shown in a number of ways that the SAT adds virtually no value for colleges in forecasting student performance in college. Consider, for instance, two prime objectives for undergraduate admissions offices: They want students who maintain good or passing grades and obtain a bachelor's degree. Admissions officers want to maximize their "correct" admissions; that is, the numbers of students who perform at or above some…