Testing and Assessment
Describe the basic characteristics of a standardized test and norms.
Standardized tests are administered and scored in a consistent manner. When standardized tests are given, conditions are held consistent and conducted in a predetermined way that is considered the standard for the test. Some of the variables that are held constant include, such as the test questions, the conditions for administering the tests, the scoring procedures and interpretations of the answers. Norms are generated by using a single test score to relate it to the test scores of other students, or by the same students at different times. Norms are generally established on a nation-wide basis on standardized tests rather than classroom tests that are made by teachers.
A test measuring basic math skills in adults is normally distributed, with mean = 20 and standard deviation = 7.
Calculate the z score and T score for each of the following test scores:
The formula for computing z-scores is:
Raw Scores
Z Scores
11
-1.29
Score is less than the mean by more than1 standard deviation
18
-0.29
Score is less than the mean by about 1/4 standard deviation
24
0.57
Score is more than the mean, by about 1/2 standard deviation
32
1.71
Score is more than the mean by more than1 standard deviation
This means the formula for computing standard scores is:
Or this more complex formula: t = [ x - ? ] / [ s / sqrt ( n ) ]
Raw Scores
Z Scores
T Scores
11
-1.29
-1.46
18
-0.29
-0.46
24
0.57
0.39
32
1.71
1.54
What is the usefulness of transforming raw scores to z scores or T scores? What do these numbers tell us?
Converting individual raw scores into a standardized form provides a more meaningful description of the individual scores that make up the distribution. Z scores are a conversion of individual raw scores into a standardized form that relies on the population mean and standard deviation. T scores -- also known as standardized scores -- are a conversion of individual raw scores into a standard form, and the transformation is made without knowledge of the population's mean and standard deviation. Because the population parameters are not know, the statistician must estimate them by using the best guess, which is essentially the corresponding sample statistics.
How would we go about determining percentile ranks for a score obtained on a standardized test?
In order to determine percentile ranks for scores that have been obtained on a standardized test, the data must be organized in order from lowest to highest. The rank of each datum or data point is said to represent "i" in the formula to calculate percentile ranks. The number of observations or test-takers, say, is represented by "n" in the formula. The percentile is then calculated by using this formula:
P = (100(i -- 0.5))/n
Where:
i = rank n = total number of observations
What information would we need to know?
Basically, in order to calculate the percentile ranks for scores that have been obtained on a standardized test, we need to know the standardized scores for each of test-takers and the total number of test-takers. We need to be able to rank order the standardized scores.
What would the percentile rank tell us?
A percentile on a test is the percentage of scores that are less than a particular given score. The most useful aspect of percentiles is that convert raw data, which is often difficult to understand or interpret, into a simpler form that is generally meaningful to an uninitiated viewer.
Describe the basic concepts and types of reliability and validity that apply to tests.
Types of Reliability.
In research, reliability means repeatability or consistency. As long as what is being measured does not change, a reliable measurement will produce the same result or outcome over and over again, as long as what is being measured does not change. A dependable measure has both validity and reliability. Reliability is an informed estimation and these estimations fall into four categories that look at reliability in a different way. 1) Inter-rater or inter-observer reliability is used to determine the consistency of several raters over time when they are observing the same thing and scoring what they observe. 2) Test-Retest Reliability - This estimate is used to determine some measure of consistency from one time to another time, again, by scoring or rating the same phenomenon. 3) Parallel-forms reliability examines the degree of consistency that exists when two tests that covering the same and are constructed the same are used for testing. And, 4) Internal consistency reliability makes sure that the test items within a single test are consistent.
Types of Validity.
Validity is the extent to which an instrument measures what it is intended to measure. Validity must be sufficiently high in a test in order for the test results to be accurately interpreted and applied. Validity is determined in several ways that show the relationship between the test and what it claims to measure. 1) Content validity is demonstrated when the test items represent all the possible items that the test could cover. That is to say, the individual questions on the test are selected from a large pool of test questions that cover a wide range of topics. 2) Criterion-related validity is demonstrated when a test is effective at predicting the indicators of a construct (which is an abstract concept). Two types of criterion-related validity exist: a) Concurrent validity and b) predictive validity. a) Concurrent validity is demonstrated when the criterion or indicators are seen at the same time as the test scores. In other words, concurrent validity indicates an accurate measure -- with respect to the criterion -- of the current state of behavior, knowledge, or skills shown by an individual at the time. b) Predictive validity is demonstrated when the criterion measures or indicators are demonstrated by the individual at some time after the test. This means that a test is thought to predict some achievement or skill or behavior that will occur later, and the validity of this predictive test is demonstrated -- or not -- depending on the later appearance of the behavior or attainment of the skill or achievement. 3) Construct validity is demonstrated when a relationship proves to be seen between the test scores and the prediction of some abstract, theoretical trait, which is referred to as a construct.
Include explanations for why a test may be high or low in each of these areas.
Validity is often difficult to demonstrate because it may require the passage of time, such as when predictive tests are used. Also, it is not always easy to obtain agreement among the judges who are responsible for establishing and ascertaining the criterion used to determine validity. Reliability may be difficult to demonstrate because is often depends on repeated measures or rating by more than one individual over time. This factors are difficult to manage and are subject to variability that may not be anticipated.
How might a researcher developing a test increase the reliability and validity of that test? Provide at least three suggestions for increasing reliability and three suggestions for increasing validity.
Reliability
1. Conduct inter-rater reliability checks by having two experienced or trained raters score the test data or answers according to the pre-established protocol. Compare the scores of the two raters, calculate a ratio, and examine the inter-rater reliability for Kappa ratios that are 81% or higher. While a Kappa score in the range of 0.61 to 0.80 indicates substantial agreement, it is good to have a higher Kappa score.
You’re 83% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.