¶ … Telephone Customer Service Representatives
Employment Assessment - Telephone
How do you interpret the reliability results for the clerical test and work sample? Are they favorable enough for the company to consider using them for keeps in selecting job applicants?
A primary objective of evaluating two new methods of assessing candidates for positions as telephone customer service representatives for the Phonemin Company is to improve the caliber of the employment pool from which new hires are selected. The participation of customer service representatives in the telephone ordering system of the company is critical to this endeavor. Moreover, the company will be adding roughly 40 employees to the call center in order to meet the anticipated growth in phone order sales. From this, it is readily apparent that effective means of assessing candidates for the positions of telephone customer service representatives is needed. The reliability figures for the current employee candidate system, which is comprised of a clerical work test and two work samples, are as follows:
For the clerical test, reliability is indicated with a high alpha coefficient of 0.85 and 0.86. This is a reasonably high reliability score. And the test-retest reliability score is even higher, at a very solid 0.92.
The work sample (T) reliability scores show an inter-rater agreement at 88% and 79%, both of which are adequate scores for inter-rater reliability. Higher reliability scores would be preferable. The work sample (C ) inter-rater reliability scores are 81% and 77%. Again, these inter-rater agreement percentages are adequate, but lower than the percentage typically desirable for inter-rater reliability scores.
Overall, the company can be comfortable using these the work samples and the clerical work test as measures to assist with the selection of candidates for positions as telephone order sales customer service representatives. The company should, however, consider undertaking measures to improve the inter-rater reliability scores for the assessments. Typical ways of improving inter-rater agreement are to increase specificity in the rating protocol and to provide additional training and periodic recalibration of raters. This is so because a known weakness of inter-rater reliability is drift, a term used to indicate movement away from the standards set by the rating protocol by the raters over time, and an increasing distance between the raters scores from each other (Shoukri, 2010).
2. How do you interpret the validity results for the clerical test and work sample? Are they favorable enough for the company to consider using them for keeps in selecting new job applicants?
Validity is a pivotal metric with any assessment. Psychometrists take precautions to ensure that validity of the instruments they use is sufficiently robust to enable confidence in the measures. Validity figures are achieved through scientific measures that are designed to assess the particular type of validity under review. Scientists regularly utilize the following types of validity: Content validity (types are face validity, curricular validity), criterion-related validity (types are predictive validity, concurrent validity), construct validity (types are convergent validity, discriminant validity), and consequential validity ("Validity evidence," 2014). Content validity examines the match between the content (often a subject area) that is being assessed and the actual test questions ("Validity evidence," 2014). Curricular validity is an expression of the extent to which a test matches specific objectives of a curriculum that is part of the training or education processes ("Validity evidence," 2014). Criterion-related validity examines the relationship between test scores and some relevant outcome, which in the assessment of candidates for employment is actual job performance ("Validity evidence," 2014). Construct validity is the degree to which a testing instrument or particular measure serves to assess the underlying theoretical construct ("Validity evidence," 2014). Consequential validity, a construct that is familiar to human resources personnel and social scientists, is a term used to refer to the social consequences of using a certain test for a particular purpose ("Validity evidence," 2014).
Criterion-related validity is the type most commonly of interest in performance and other psychometric tests used to assess the suitability of candidates for employment ("Validity evidence," 2014). The relevant figures for the clerical test include the following: Work sample (T) and work sample (C ) correlations are low and are not significant. The criterion-related validity measures of the clerical test show significant correlations with a negative error rate and a positive speed. Correlation with complaints about telephone sales order customer service representatives are non-significant. Work sample (T) is highly correlated with work sample (C ). The criterion-related validity figures for work sample (T) are not significant for error rate (which is negative) and a speed (which is positive). However, the correlations with complaints about telephone sales order customer service representatives are high and significant. Work sample (C ) is highly correlated with work sample (T). As with work sample (C ), correlation figures with work sample (T) are not significant for error rate (which is negative) and a speed (which is positive). And, again for this work sample, correlations with complaints about telephone sales order customer service representatives are high and significant.
The results of the work samples and the clerical test indicate that they are at appropriate levels for using in the selection process for the telephone order sales customer service representatives. The finding that the two work sample tests are fundamentally redundant is helpful, as one or the other could be eliminated from the assessment battery without any deleterious effects. The clerical test was found to be a good predictor assessment for two different criteria: error rate and speed. This leads one to believe that using one work sample test and the clerical test is an economical and accurate predictor of future job task performance.
3. What limitations in the above study should be kept in mind when interpreting the results and deciding whether or not to use the clerical test and work sample?
A. How similar are the new applicants to the workers used for the study? If they are not similar, then the results of this study are less generalizable to other populations.
The new applicants form a new data set, and although the differences between current workers and new employee may not be great, it is strongly beneficial to treat them as separate groups in this and subsequent analyses. Indeed, generalizability is always an issue in quasi-experimental or descriptive studies. The best solution for this problem is to clearly segregate data sets and to run periodic analyses of the fit between the assessment processes and the performance of employees overall. The layman's solution would be to keep a close eye on the performance of newly hired employees.
B. Are the criterion measures used (e.g., error rate, complaints) really valid indicators of CSR performance? If they are not, then the tests are not really predicting important dimensions of CRS performance.
You’re 82% through this paper. Sign up to read the full paper.
Sign Up Now — Instant Access Already a member? Log inAlways verify citation format against your institution’s current style guide requirements.