Paper Example Undergraduate 1,279 words

Test Blueprint Validity and Reliability

Last reviewed: July 11, 2016 ~7 min read

Administering the tests developed and formulated for the nursing-based curriculum entails providing reliable test items. Reliability is important because it helps counteract human error both on the part of the student taking the test and the person grading the test. "Reliability is the quality of a test which produces scores that are not affected much by chance. Students sometimes randomly miss a question they really knew the answer to or sometimes get an answer correct just by guessing" (KU, 2016). By increasing the reliability of the test items, the quality remains consistent in the test and offers a superior level of testing that avoids the pitfalls of unavoidable human error.

For instance, there are different ways to construct a test. With each different way, there are various measuring methods to help provide reliable outcomes. Essay questions and multiple-choice questions can contain qualitative or quantitative data that are measured differently. Multiple-choice questions are measured quantitatively and some ways to measure such data is the Kuder-Richardson, the alpha coefficient, or the split-half.

The Kuder and Richardson Formula 20 test allows the person to check for "internal consistency of measurements with dichotomous choices" (Real Statistics, 2016). It is equal to the aforementioned split half methodology in "all combinations of questions" (Real Statistics, 2016), remaining applicable whether the question is wrong or right. The value for a wrong answer/question is 0 and the value for a right answer/question is 1. With values ranging from 0 to 1 and reliability is indicated from a high value, an excess of .90 is a major indicator of a homogenous test.

The alpha coefficient goes by a different name. It is Cronbach's alpha and is a measure of how closely associated a set of items exist as a group (internal consistency). If there is a 'high' value for alpha this does not suggest a unidimensional measure. If there is a desire to offer evidence for a unidimensional scale, further analyses may be done. To check dimensionality, a common method is exploratory factor analysis. Although Cronbach's alpha is not actually a statistical test, is considered a coefficient of consistency/reliability.

Split-half reliability is another measure of consistency and is similar to the Kuder and Richardson Formula 20 test, except in this test, the scores are split into two parts that are then compared with one another. If the results remain consistent, the assumptions are it is highly likely measuring the same thing. This test does not measure validity. It just measures consistency/reliability. However, by recognizing the reliability of the test analyzed, it therefore sets the limit/ceiling of validity.

These methods for testing reliability allow the multiple-choice portion of an exam to be properly analyzed and create certainty over the consistency of the test items, forming the basis of a superior quality test. Essay questions have a different method of analysis since that data is qualitative. Qualitative data is generally analyzed using an analytic rubric.

Analytic rubrics are seen often in paper assignments where students receive a table featuring a vertical and horizontal list of categories. The scale is usually from 1-5 with a score at the end. This is for the horizontal categories with 1 being the worst and 5 being exemplary. The vertical aspect contains what is being graded such as presentation, grammar, and so forth. These categories relate to the instructions of an assignment and how well the student follows said instructions. It is a reliable measure to take because it provides a detailed explanation of why a student receives the score they do, avoiding any misconceptions. "It gives students a clearer picture of why they got the score they got. It is also good for the teacher, because it gives her the ability to justify a score on paper, without having to explain everything in a later conversation" (Gonzalez, 2014).

Just like with testing reliability, there are ways to test validity. Content validity helps the test creator address the "match between test questions and the content or subject area they are intended to assess. This concept of match is sometimes referred to as alignment, while the content or subject area of the test may be referred to as a performance domain" (College Board, 2016). Those that judge content validity tend to be experts such as is the case of the SAT Subject Tests. Scores are assessed by committees comprised of experts that guarantee each test does an excellent job of covering content that matches every pertinent subject matter within its academic area. In order to establish the content validity of a test, a curricular validity study and a face validity study may be conducted.

Face validity pertains to the degree of which the questions on an exam or the exam itself appear to measure a specific and widely accept construct as viewed by examinees, laypersons, test users, clients, and the public. Plainly speaking, the test is checked to see if it gives the appearance of a reasonable test for whatever subject matter or purpose it is covering. Curricular validity is used to check if the content of the test and the objectives of any given curriculum match and the content is formally described.

Criterion-related validity "looks at the relationship between a test score and an outcome. For example, SAT™ scores are used to determine whether a student will be successful in college. First-year grade point average becomes the criterion for success" (College Board, 2016). By examining the relationship between the criterion and test scores, it demonstrates how valid a test like the SATs is for determining a person's success in college. For placement tests, the criterion is grades. Research can also be a criterion for tests looking to test expert knowledge in any given subject, it all varies.

There are some factors that may be considered as threats to both validity and reliability within my test items based on my test blueprint. One factor is the quantity of students taking the test versus the number of students that attend the class. For example, if there are 25 students in the class, but only 20 students take it, that may skew validity and reliability. Another is the degree of difficulty for every test item. Some answers may be more complex, harder to understand, longer to solve than other test items.

You’re 87% through this paper. Sign up to read the full paper.

130,000+ paper examples AI writing assistant Citation generator Cancel anytime

Cite This Paper

PaperDue. (2016). Test Blueprint Validity and Reliability. PaperDue. https://www.paperdue.com/essay/test-blueprint-validity-and-reliability-2161519

Always verify citation format against your institution’s current style guide requirements.