Reliability and Validity Measurements Research Paper

Excerpt from Research Paper :

Reliability of Test

Reliability is defined by Joppe (2002,p.1) as the level of consistency of the obtained results over a period of time as well as an accurate representation of the population under study. If the outcome of the study can be reproduced using a similar methodology then the instrument used in the research are said to be reliable.

It is worth noticing that there is an element of replicability as well as repeatability ff the observations or results. The work of Kirk and Miller (1986,41-42) indicated that there exists three different types of validity in any given quantitative research. These however, all relate to; the extent to which the give measure if repeated, remains constant, the stability of the given measure over a period of time as well as the similarity of the given measurements in a given time period. The work of Charles (1995) focuses on the idea of consistency with which a given test item is answered. The test-retest method is a type of reliability test. The attribute of a given instrument that is tested for reliability is called stability. A stable measure would produce similar results. A high level of stability is indicative of a very high level of instrument reliability. This indicates that the results are measurable. There is a problem with the test-retest method as pointed out by Joppe (2000).The problem would ultimately make the test to become unreliable to a certain degree. Joppe (2000) explained that the test-retest technique may lead to the sensitization of the respondents on the specific subject matter and thereby influencing their responses. Reliability therefore refers to the level of consistency of a given measure. In psychology for example, if a test is aimed at testing a trait such as introversion, then every time it is administered to a given subject then the results obtained should be approximately similar. The downside is that it is never easy to calculate reliability precisely. Ways of approximating it however, are in existence (Cherry, n.d)

Types of reliability

There are several types of reliability (PTI,2006, Cherry, n.d).They are as follows;

Test-Retest reliability

In this type of a reliability test, the r est is administered twice at two distinct points in time (Cherry, n.d).This reliability test assumes that there is never going to be a change in the construct or quality being measured. It is generally employed for things that are usually stable over a period of time like intelligence.

Inter-rater Reliability

This form of reliability is undertaken by having two judges who are both independent, score the test. The scores that are obtained are then compared critically in order to determine the level of consistency of the rates' estimates. A technique of testing the inter-rater reliability is to score items based on a 1-10 scale. The next process is the calculation of the correlation that exists between the two scores so as to determine the degree of inter-rater reliability.

Parallel-Forms Reliability

This form of reliability is determined by comparing the various (different tests) that were originally created using similar content. This is achieved by creating a large set of test items that are aimed at measuring a similar quality and then dividing (randomly) these items into two tests that are separate.

Internal Consistency Reliability

This type of reliability is employed in the judgment of the consistency of results that are obtained across items that are based on the same test. It basically involves the comparison of the test items that are integral in the measurement of the same construct so as to determine the internal consistency of the tests.

II. Validity (Test Validity)

The work of Joppe (2000) provided an explanation of validity as a determination of whether the given research really measures whatever it is intended to measure as well as the level of truthfulness of the results. Wainer and Braun (1998) on the other hand referred to validity as "construct validity."

Validity in the context of psychology has been extensively been discussed by Tebes (2000).The work of Cook and Campbell (1979) identified four major types of validity; internal validity, statistical conclusion validity, external validity and construct validity. The internal validity makes inference regarding causal relationships in cases involving two articles. The statistical conclusion validity makes inferences regarding the covariations that exist between two variables. External validity involves the generalization to settings, times and other persons. Construct validity involves generalization on theoretical relationship between cause and effect.

Forms of validity

Face validity

Cherry (n, d) defined face validity as a very simple form of validity that involves the determination of whether the test really measures whatever item it is mean to measure. In this form of a test, the researchers take the validity of the test at 'face value'. This is done by examining whether the test actually appears to really measure the intended variable. As an example, a researcher may be interested in measuring happiness and then the test would be said to possess the face validity should it appear to measure the level of happiness. The disadvantage of this test is that it is not accurate since it measures just the superficial signs of a variable. There is therefore a need for the researchers to carry out further investigations into the matter.

Content validity

Content validity refers to the degree to which the items in a given instrument are a reflection of the content universe for which a given instrument will be appropriately generalized (Straub et al. 2004). Generally, the concept of content validity entails the evaluation of a given new instrument so as to ensure that it does include all of the items that are considered essential while eliminating the items that are deemed undesirable to a given construct domain as pointed out by Lewis (1995).When a given test possesses content validity, then the items on that given test are a representation of an entire range of possibilities in regard to the items that the test should include. Content validity has the disadvantage of being bias since it depends on the opinions of the judges in the rating of the items.

Criterion validity

A given test is said to possess criterion-related validity if it has demonstrated beyond reasonable doubt that it is highly effective in the prediction of the criterion as well as the indicators of a given construct (Cherry, n.d).Miller et al. (2003) pointed out that Criterion-related validity is determined whenever one needs to determine the relationship that exists between the scores of a test that is aimed at testing a specific criterion. An example being the scores on a given admission test being related and relevant to criteria like grade point average. There are two types of criterion validity;

1. Concurrent validity which occurs whenever the criterion measures are achieved at the same time as the test scores. This is indicative of the extent to which the obtained test scores are an accurate estimation of the current state of a situation or individual on the basis of the criterion. As an example, in the measure of the extent of depression, the administered test may be described as having concurrent validity if it succeeds in measuring of the current depression levels that are experienced by a subject.

2. Predictive validity which is noted to occur whenever the measures of a criterion are obtained at such a time after the test has been completed. Examples are aptitude tests.

The weakness of predictive validity is that it never tests all of the available data and therefore the selected items can never by definition proceed to produce scores on a given criterion.

Construct validity

Construct validity is demonstrated if in a given test, there is an association that exists between the test scores and the available predictions of a given theoretical trait. Examples are intelligence tests. Construct validity is therefore the extent to which a give instrument can measure a trait and/or a given theoretical construct that it is meant to measure (Miller et al.,2003).

III. What must a psychologist do before they use a test to assure that the test has adequate levels of reliability and validity for the client who is being tested?

For the psychologist to ensure that the test has adequate levels of reliability and validity for the client who is being tested. They must do a number of things;

These actions for ascertaining reliability and validity are dictated by the racial, cultural and educational background of the client. There are certain tests such as IQ test that are culturally sensitive and should never be administered to people of cultural minorities like the black community. The complexity of the questions should also be guided by the academic level of the client. Groth-Marnat (2003) pointed out that prior to conducting any test; the first thing is to determine the competence of the subject. This is done by establishing their competence. Competence is this case is defined as the subjects' ability to meaningfully cooperate with the psychologist. It also necessary to reassure the person being assessed of their rights…

Cite This Research Paper:

"Reliability And Validity Measurements" (2011, June 17) Retrieved August 17, 2017, from

"Reliability And Validity Measurements" 17 June 2011. Web.17 August. 2017. <>

"Reliability And Validity Measurements", 17 June 2011, Accessed.17 August. 2017,