Reliability and Validity
Reliability
Reliability refers to the capacity of an instrument to capture the most accurate and "truest" score of an individual. A reliable test enables us to distinguish one individual from another with confidence that errors will primarily be generated by individual differences, and to a lesser extent, to the imperfections of the instrument. Indicators that a test is reliable include test-retest reliability that is accounted for by the internal consistency in the components of the test. The test-retest criteria is generally considered a manifestation of the consistency of measurement for individual performance over time, such that the score a person gets on a test today will be the same -- or nearly the same -- as the score the person gets on a test, say, in three, six, or twelve months. There are a number of substantive issues with the test-retest criteria, including chance covariation, memory, and populations subsets. Chance covariation is a possible factor since there is always a non-zero covariance between any two variables. Indeed, when a number of items on a test are correlated, the reliability of the test is greater since it is better to have more measures of a construct than fewer. When the time periods between the testing sessions are too short, the test-taker is likely to remember test items and responses. This means that the scores will be correlated rather than providing a true score. Reliability is not the same for all subsets of a population. For example, while IQ scores age considered to be relatively consistent in the general heterogeneous adult population, the IQ scores of a homogeneous population -- say, for example, college students -- will be less reliable since there are fewer individual differences and the range of IQ scores are likely to be restricted. Poor reliability can negatively impact statistical power; that is, increase the chance of committing a Type II error. Reliability is an important attribute of a test since it conveys the consistency of the instrument.
Part 2. Validity
Schwenk, G. (2009). Evaluating social influence relations: An item-response-modeling approach. Metodoloski zvezki, 6(1), 27-50.
The questionnaire items proposed in this measurement instrument focus on three distinct dimensions of social influence: authority, coercion, and persuasion. The instrumentation is intended to be use in social networks, and uses network autocorrelation models to examine applicability. Of the three dimensions of social influence, persuasion had the highest level of validity. Authority was understood to be the perception of rational and accepted authority of a contact person, while coercion focused on the use of coercive means in everyday interactions by the contact person. The dimension of authority did not provide an improvement in predictive modeling, while the coercion dimension yielded predictions that were new, but that did not fit well with the model. However, the authors of the instrumentation noted that context and users accounted for much of the validity. For example, when the instrument was employed in a closed network case study of university professors, the persuasion dynamic was a good fit with the prevailing culture while neither authority or coercion were viewed as being useful within the university context. The network autocorrelation model showed that the persuasion measures had a substantive predictive gain. For this instrument, content validity appeared to be pegged to the context and the subjects. The construct of persuasion was decomposed into two individual concepts referred to as information dependence and perceived similarity. Perceived similarity represented the perceived helplessness of an individual contact person with respect to their own coping skills for a problem. Informational dependence is specific to the situation and so is intended to be measured in an individualized manner. Since information dependence is robustly situation specific, the authors of the instrument developed an Item Response Scale (IRT) for use with perceived similarity, which was theorized to be not specific to situations. The items on the instrument were selected in accordance with the outcomes of a quantitative item analysis. The purpose of the item analysis was to create item sets characterized by easily understandable semantics and to demonstrate an acceptable fit. The range of agreement was a 5-point scale.
You’re 100% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.