Limitations of Norms in Psychological Testing
Tests that are norm-referenced provide a number of benefits over non-norm-referenced tests. Psychological tests enable the gathering of valuable information about individual functioning for many different areas. Most norm-referenced tests are relatively quick to administer, such that a psychologist can obtain a sampling of behavior with a small investment of time and resources. A primary advantage of psychological testing is that rich and detailed information is revealed through the testing that would otherwise be unavailable to the psychologist. However, norm-referenced tests are far from perfect and the quality, reliability, and validity of norm-referenced tests varies substantially in some very important ways.
A number of assumptions are important to the construction of norms. The characteristic being measured must accommodate the ordering of individuals from low to high along an asymmetrical continuum that should at least be ordinal (Angoff, 1984). In addition, the relation of the scores must be transitive. That is to say that the mathematical definition of transitive states that: If a condition "applies between two successive members of a sequence, it must also apply between any two members taken in order," such that, for example, "if A is larger than B, and B. is larger than C, then A is larger than C" ("Transitive," n.d.) (Angoff, 1984). The operational definition of the characteristic being measured must be reasonably clear and valid to a degree that it yields similar orderings of the characteristic in the individuals (Angoff, 1984). The range of scores for a characteristic must all evaluate that same characteristic (Angoff, 1984). There must be a good match between the group(s), the target characteristics, and the test design and purpose (Angoff, 1984). Norms are meaningful and useful only to the extent that they have carefully defined. The norms population must be appropriate to the subject being tested and to the test; the challenge is to define the concept of appropriateness without conflating it with the concept of difficulty (Angoff, 1984). This means that a test or a subject can be difficult for many of the test takers, yet the test or the subject can still be considered appropriate for that population of test takers (Angoff, 1984).
Normative data should be developed for each distinct norms population for which it is meaningful to make comparisons with individuals or the group (Angoff, 1984). The test items themselves must be subject to pilot testing in which the data about the test items is drawn from samples of the population for which the test is being developed; that is to say, for the groups for which the norms will be provided (Angoff, 1984). Populations that serve as the basis for a set of norms should evidence homogeneity (Angoff, 1984). This means that all the individual are clearly members of the group and are logical and/or actual "competitors" in the same arena (Angoff, 1984).
Overview of Norms in Psychological Testing
A variety of norms exist, including the following: National norms, local norms, age and grade equivalents, item norms, school mean norms, user-selected norms, special study norms, and norms that yield direct meaning. This discussion centers on norms that are used for psychological tests, for which the following section provides an overview of how norms are developed.
Standardization samples are generated for psychological tests so that tests can be referenced to a normal distribution that is used to compare scores on specific future tests. Standardization relies on the creation of a large sample of test takers who are representative of the larger population for which the test is being developed. This standardization sample is referred to as the norm group or norming group. The raw scores of a sample group are converted into percentiles, which can be associated with a constructed normal distribution that will be used to rank the relative standing of individuals who take the test some time in the future.
Norms function as frames of reference for the interpretation of test scores, but norms are not performance standards or clinical ideals. The size of norm groups varies widely, ranging from just a few hundred up to a hundred thousand people. As with other types of samples, the more individuals that are included in the norm group, the closer the sample is to an approximation of a normal population distribution. Moreover, normative data illustrates how the dimensions of major population subgroups differ and the extent to which test variables are associated with the population classifications.
That several different authoritative sources are involved in the determination of criteria for norms is an inherent complication that erodes efforts to ensure that norm-referenced tests conform to particular high standards. The process of establishing norms for indexing performance over time and comparing the performance of individuals in groups follows a specified course of action which may be periodically modified, as described below.
Frankeburg, et al. (1992) conducted a major revision and re-standardization of the popular Denver Developmental Screening Test. Specific items and some features were a concern to test users and, since the issues had been raised over several years, the test was changed after 23 years running (Frankeburg, et al., 1992). Regression analysis, test-retest reliability, and inter-rater reliability were used to evaluate the test items (Frankeburg, et al., 1992). The new Denver II showed an 86% increase in language items, two new articulation items, new age scale and a new category of item interpretation to accommodate milder developmental delays, a behavior rating scale, and some new training materials (Frankeburg, et al., 1992).
A good example of the importance of adjusting normative data through periodic reviews is evident in the research conducted by Vakil, et al. (2010) for the Rey Auditory Verbal Learning Test (AVLT) The Rey Auditory Verbal Learning Test enables the derivation of several verbal memory measures, and the "simultaneous comparison of performance on several measures allows for a more comprehensive characterization of verbal memory than with a single measure" (Vakil, et al., 2010, p. 663). This test is differentially sensitive to the effects of age, brain trauma, gender, and psychiatric condition. New normative data was established -- as a supplement to the existing norms -- for the Rey AVLT. The norms were based on individual trials of cohort groups (943 children from age 8 to 17 years, and 528 adult aged 21 to 91 years), and were the result of changes in composite scores for the very young and the very old age groups, which was attributed to frontal lobe maturation and deterioration.
Reconciling Limitations and Appropriate Use of Norms
Several studies are included here that serve as illustrations of the problems encountered in the field when psychologists encounter lax standards for ensuring norm referencing process are of high and standard quality. Sociodemographic factors can profoundly influence the accuracy of neuropsychological test, as demonstrated by Ferrett, et al. (2014). In their research with Afrikaans-speaking and English-speaking adolescents from the Cape Town region of South Africa, Ferrett, et al. (2014) used ANCOVAs to demonstrate that quality of education and age had the biggest impact on test performance of the possible sociodemographic factors. Three tests endorsed by the World Health Organization (WHO) were used: The Grooved Pegboard Test (AVLT), the Children's Color Trails Test (CCTT), and the WHO / UCLA version of the Auditory Verbal Learning Test (AVLT). The authors concluded that, "Comparisons between diagnostic interpretations made using foreign normative data vs. those using current local data demonstrates that it is imperative to use appropriately stratified normative data to guard against misinterpreting performance" (Ferrett, et al., 2014, p.1).
Test developers do not always make a sufficient effort to ensure that tests they design have adequate psychometric properties (Kirk and Vigeland, 2014). For example, Kirk and Vigeland (2014) conducted a review of the psychometric properties of six different norm-referenced assessments that were intended to measure children's phonological error patterns. In this review, the researchers evaluated the normative sample, reliability, and validity by using the current recommendations and criteria in the literature (Kirk and Vigeland, 2014). The sample size was found to be inadequate, there was poor evidence of construct validity, and insufficient information was provided about diagnostic accuracy (Kirk and Vigeland, 2014).
Spaulding, et al. (2012) conducted research to determine if norm-referenced tests sanctioned by U.S. State Education Departments served to identify the severity of language impairment in children. The researchers evaluated the consistency across state criteria in test manuals, the intentions of the test developers, and the characteristics of the tests (Spaulding, 2012). Manuals for 45 norm-referenced tests to assess the language of children were reviewed (Spaulding, 2012). Only eight states were observed to publish guidelines specifying the use of norm-referenced tests for determining language impairment severity (Spaulding, 2012). No only was there wide variation in the severity determination cutoff-point criteria, but the cutoff-point criteria did not align with the severity cutoff points that were detailed in the test manuals (Spaulding, 2012).…
Angoff, W.H. (1984). Scales, norms, and equivalent scores. Princeton, NJ: Educational Testing Service. Retreived from https://www.ets.org/Media/Research/pdf/Angoff.Scales.Norms.Equiv.Scores.pdf
Ferrett, H.L., Thomas, K.G., Tapert, S.F., Carey, P.D., Conradie, S., Cuzen, N.L., Stein, D. J, and Fein, G. (2014, June). The cross-cultural utility of foreign- and locally-derived normative data for three WHO-endorsed neuropsychological tests for South African adolescents. Metabolic Brain Disease, 29(2), 395-408. DOI: 10.1007/s11011.014.9495-6. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/24526566
Frankeburg, W.K., Dodds, J.A., Shapiro, H. And Bresnick, B. (1992, January). The Denver II: A major revision and re-standardization of the Denver Developmental Screening Test. Pediatrics, 89(1), 91-97. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/1370185
Kirk, C. And Vigeland, K.C. (2014, October). A psychometric review of norm-referenced tests used to assess phonological error patterns. Language, Speech, and Hearing Services in Schools, 45(4), 365-77. DOI: 10.1044/2014_LSHSS-13-0053. Retreived from http://www.ncbi.nlm.nih.gov/pubmed/25091265
Spaulding, T.J., Swartwout Szulga, M., and Figueroa, C. (2012, April). Using norm-referenced tests to determine the severity of language between U.S. policy makers and test developers. Language, Speech, and Hearing Services in Schools, 43(2), 365-77. DOI: 10.1044/-1461(2011/10-0103). Retreived from http://www.ncbi.nlm.nih.gov/pubmed/22269585
Transitive. Google. Retrieved from https://www.google.com/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=transitive
Vakil, E., Greenstein, Y., & Blachstein, H. (2010). Normative data for composite scores for children and adults derived from the Rey Auditory Verbal Learning Test. Clinical Neuropsychologist, 24(4), 662-677. Retrieved from http://faculty.biu.ac.il/~vakil/papers/clinical_neuropsychologist_2010.pdf
Psychology and Ethics Test Administration, Scoring, and Interpretation Common errors made in the administration, scoring, and interpretation of psychological tests depends upon adherence to reliable practices and guides. The concepts of reliability and validity are situated within the framework of the tests themselves, which serves to affect the field of psychological testing by supporting it with data culled from participants. These tests, moreover, are formed using principles of psychological testing, such as
Purpose: The Woodcock-Johnson III Diagnostic Reading Battery's designation is for assessment and measurement of the important dimensions of phonological oral language abilities and phonological awareness, both in adult and children. Population: Both adults and children (age of 3-80 years). Date of Publication: 2004. Acronym(s): WJ III (DRB). Score Scales: Reading Comprehension, Basic Reading Skills, Phonics Knowledge, Broad Reading, Brief Reading, Total Reading, Reading Fluency, Spelling of Words, Oral Comprehension, Reading Vocabulary. Time: 50-60 minutes. Administration: Individual. Author
Individuals scoring high on this scale are preoccupied about their health, tend to exaggerate symptoms, and are considered to be demanding and immature. Scoring high on this scale is associated with complaints of chronic pain, fatigue and weakness. Individuals scoring low on this scale are held to be: "Healthy, insightful, and optimistic" (MMPI, nd) Scale Two - Depression: This scale is used for assessing symptomatic depression exhibited as poor morale,
Management Strategy to Utilize Meta-Analysis Technique for Nuclear Energy and Waste Disposal and Create Social Sustainability This research proposal explores the link between public perceptions of nuclear power, how those perceptions are formed, and what influence those opinions have on energy policy. These issues are important in light of two realities. First, nuclear energy is declining in its share of global energy. Second, nuclear energy offers what might well be
Furthermore, people change over time as a result of experience. Thus, the MBTI may capture one's current state, but can not predict one's state in the future. The MBTI is currently the fourth most frequently used standardized test in community-based treatment settings. The test is intended for subjects 14 years and older. Versions adapted for other countries have been developed. The test administrator must have a college degree and have
(Reachout Trust, para. 12) Over the past six decades, the MBTI has become very successful worldwide. It is used by a number of educational concerns, non-profit organizations and corporations for a variety of reasons. These include: Careers/Personal Development: The MBTI helps people identify career and/or life paths. A person's type preferences indicates skills they are most likely to pick up easily, as well as occupations they might be interested in or