Norm- Versus Criterion-Referenced Tests

The difference between norm- and criterion-referenced tests is that the former compares test scores to a reference group, while the latter compares test scores to a performance standard. Norm-referenced tests are quite common. For example, student reading performance in primary schools may be compared to the mean score for all children of the same age. The norm comparison group would likely consist of all students within a school district, state, or nation who took the same test at the same age. Students who scored lower or higher than the mean for the norm reference group would be ranked as low or high achievers. Imagine, however, if someone wishing to qualify for a motor vehicle license was only required to achieve a score close to the mean score for all drivers? Using a norm-referenced driver test would likely be a bad public safety choice, especially if there are a lot of bad drivers on the road.

By comparison, public safety would be better served if all licensed drivers were required to understand 90% of road signs, be able to parallel park, and could navigate a complex and busy intersection without any problems. These represent standards of performance and therefore driver's tests are typically criterion-referenced tests. When using a criterion-referenced test it does not matter whether the majority of the population performs more poorly or better than the reference standard because the standard is not tied to population performance statistics. This is probably the most important difference between norm- and criterion-referenced tests, because the performance of the norm-referenced group may change over time, thereby altering the performance standards of the test. By comparison, the reference standards on a criterion-referenced test will not change, regardless of changes in the sampled population.


Although selection of the norming group depends largely on who is conducting the testing, all norming groups should be adequately described to facilitate the performance testing being done and to provide enough information for other researchers interested in using the norming group for their own needs. Other considerations include a group size sufficient to create enough power for statistical comparisons. Norming groups are often minimally described using the demographic variables of age, gender, ethnicity, education, and income.

When children are administered the Wechsler Intelligence Scale for Children (WISC) the scores obtained are compared to mean test scores of children at the same age (School Psychologist Files, n.d.). The means were obtained by having thousands of children take the test, which implies the intelligence measured by the WISC is in comparison to norming groups stratified by age. When Yang and colleagues (2013) administered the WISC, version IV, to Taiwanese school children suffering from attention deficit hyperactivity disorder (ADHD) they were comparing the scores to norming groups from China. Adequate norming groups for this study would have been Chinese children stratified by age; however, the authors expressed some concern about the validity of the comparison between WISC scores obtained by Taiwanese children and those obtained by children living in mainland China, due to cultural differences. This example does illustrate, however, how complex the qualities of a norming group can be and how important it is to select an appropriate norming group for a specific comparison.

Question 3: Scales

There are four basic scales that…

