Florida National University
Course Reflection
Guidelines
Purpose
The purpose of this assignment is to provide the student an opportunity to reflect on selected RN-BSN competencies acquired through the course.
Course Outcomes
This assignment provides documentation of student ability to meet the following course outcomes:
· Identify the different legal and ethical aspects in the nursing practice (ACCN Essential V; QSEN: patient-centered care, teamwork and collaboration).
· Analyze the legal impact of the different ethical decisions in the nursing practice (ACCN Essential V; QSEN: patient- centered care, teamwork and collaboration).
· Understand the essential of the nursing law and ethics (ACCN Essential V; QSEN: patient-centered care, teamwork and collaboration).
Points
This assignment is worth a total of 100 points .
Due Date
Submit your completed assignment under the Assignment tab by Monday 11:59 p.m. EST of Week 8 as directed.
Requirements
1. The Course Reflection is worth 100 points (10%) and will be graded on quality of self-assessment, use of citations, use of Standard English grammar, sentence structure, and overall organization based on the required components as summarized in the directions and grading criteria/rubric.
2. Follow the directions and grading criteria closely. Any questions about your essay may be posted under the Q & A forum under the Discussions tab.
3. The length of the reflection is to be within three to six pages excluding title page and reference pages.
4. APA format is required with both a title page and reference page. Use the required components of the review as Level 1 headers (upper and lower case, centered):
Note: Introduction – Write an introduction but do not use “Introduction” as a heading in accordance with the rules put forth in the Publication manual of the American Psychological Association (2010, p. 63).
a. Course Reflection
b. Conclusion
Preparing Your Reflection
The BSN Essentials (AACN, 2008) outline a number of healthcare policy and advocacy competencies for the BSN-prepared nurse. Reflect on the course readings, discussion threads, and applications you have completed across this course and write a reflective essay regarding the extent to which you feel you are now prepared to (choose 4):
1. “Demonstrate the professional standards of moral, ethical, and legal conduct.
2. Assume accountability for personal and professional behaviors.
3. Promote the image of nursing by modeling the values and articulating the knowledge, skills, and attitudes of the nursing profession.
4. Demonstrate professionalism, including attention to appearance, demeanor, respect for self and others, and attention to professional boundaries with patients and families as well as among caregivers.
5. Demonstrate an appreciation of the history of and contemporary issues in nursing and their impact on current nursing practice.
6. Reflect on one’s own beliefs and values as they relate to professional practice.
7. Identify personal, professional, and environmental risks that impact personal and professional choices, and behaviors.
8. Communicate to the healthcare team one’s personal bias on difficult healthcare decisions that impact one’s ability to provide care.
9. Recognize the impact of attitudes, values, and expectations on the care of the very young, frail older adults, and other vulnerable populations.
10. Protect patient privacy and confidentiality of patient records and other privileged communications.
11. Access interprofessional and intra-professional resources to resolve ethical and other practice dilemmas.
12. Act to prevent unsafe, illegal, or unethical care practices.
13. Articulate the value of pursuing practice excellence, lifelong learning, and professional engagement to foster professional growth and development.
14. Recognize the relationship between personal health, self-renewal, and the ability to deliver sustained quality care.” (p. 28).
Reference:
American Association of Colleges of Nursing [AACN]. (2008). The essentials of baccalaureate education for professional nursing practice. Washington, DC: Author.
Directions and Grading Criteria
Category |
Points |
% |
Description |
(Introduction – see note under requirement #4 above) |
8 |
8 |
Introduces the purpose of the reflection and addresses BSN Essentials (AACN, 2008) pertinent to healthcare policy and advocacy. |
You Decide Reflection |
80 |
80 |
Include a self-assessment regarding learning that you believe represents your skills, knowledge, and integrative abilities to meet the pertinent BSN Essential and sub-competencies (AACN, 2008) as a result of active learning throughout this course. Be sure to use examples from selected readings, threaded discussions, and/or applications to support your assertions to address each of the following sub-competencies: (a) “Demonstrate the professional standards of moral, ethical, and legal conduct. (b) Assume accountability for personal and professional behaviors. (c) Promote the image of nursing by modeling the values and articulating the knowledge, skills, and attitudes of the nursing profession. (d) Demonstrate professionalism, including attention to appearance, demeanor, respect for self and others, and attention to professional boundaries with patients and families as well as among caregivers. (e) Demonstrate an appreciation of the history of and contemporary issues in nursing and their impact on current nursing practice. (f) Reflect on one’s own beliefs and values as they relate to professional practice. (g) Identify personal, professional, and environmental risks that impact personal and professional choices, and behaviors. (h) Communicate to the healthcare team one’s personal bias on difficult healthcare decisions that impact one’s ability to provide care. (i) Recognize the impact of attitudes, values, and expectations on the care of the very young, frail older adults, and other vulnerable populations. (j) Protect patient privacy and confidentiality of patient records and other privileged communications. (k) Access interprofessional and intra-professional resources to resolve ethical and other practice dilemmas. (l) Act to prevent unsafe, illegal, or unethical care practices. (m) Articulate the value of pursuing practice excellence, lifelong learning, and professional engagement to foster professional growth and development. (n) Recognize the relationship between personal health, self-renewal, and the ability to deliver sustained quality care.” (p. 28). |
Conclusion |
4 |
4 |
An effective conclusion identifies the main ideas and major conclusions from the body of your essay. Minor details are left out. Summarize the benefits of the pertinent BSN Essential and sub-competencies (AACN, 2008) pertaining to scholarship for evidence-based practice. |
Clarity of writing |
6 |
6 |
Use of standard English grammar and sentence structure. No spelling errors or typographical errors. Organized around the required components using appropriate headers. Writing should demonstrate original thought without an over-reliance on the works of others. |
APA format |
2 |
2 |
All information taken from another source, even if summarized, must be appropriately cited in the manuscript and listed in the references using APA (6th ed.) format: 1. Document setup 2. Title and reference pages 3. Citations in the text and references. |
Total: |
100 |
100 |
A quality essay will meet or exceed all of the above requirements. |
Grading Rubric
Assignment Criteria |
Meets Criteria |
Partially Meets Criteria |
Does Not Meet Criteria |
(Introduction – see note under requirement #4 above)
(8 pts) |
Short introduction of selected BSN sub-competencies (AACN, 2008) pertinent to scholarship for evidence-based practice. Rationale is well presented, and purpose fully developed.
7 – 8 points |
Basic understanding and/or limited use of original explanation and/or inappropriate emphasis on an area.
5 – 6 points |
Little or very general introduction of selected BSN sub-competencies (AACN, 2008). Little to no original explanation; inappropriate emphasis on an area.
0 – 4 points |
You Decide Reflection
(80 pts) |
Excellent self-assessment of skills, knowledge, and integrative abilities pertinent to healthcare policy and advocacy. Reflection on pertinent BSN sub-competencies (AACN, 2008) supported with examples.
70 – 80 points |
Basic self-assessment of skills, knowledge, and integrative abilities pertinent to healthcare policy and advocacy. Reflection on pertinent BSN sub-competencies (AACN, 2008) not supported with examples.
59 – 69 points |
Little or very general self-assessment of skills, knowledge, and integrative abilities pertinent to healthcare policy and advocacy. Little or no reflection on pertinent BSN sub-competencies (AACN, 2008) or reflection not supported with examples.
0 – 58 points |
Conclusion
(4 pts) |
Excellent understanding of pertinent BSN sub- competencies (AACN, 2008). Conclusions are well evidenced and fully developed.
3 – 4 points |
Basic understanding and/or limited use of original explanation and/or inappropriate emphasis on an area.
2 points |
Little understanding of pertinent BSN sub-competencies (AACN, 2008). Little to no original explanation; inappropriate emphasis on an area.
0 – 1 point |
Clarity of writing
(6 pts) |
Excellent use of standard English showing original thought with minimal reliance on the works of others. No spelling or grammar errors. Well organized with proper flow of meaning.
5 – 6 points |
Some evidence of own expression and competent use of language. No more than three spelling or grammar errors. Well organized thoughts and concepts.
3 – 4 points |
Language needs development or there is an over-reliance on the works of others. Four or more spelling and/or grammar errors. Poorly organized thoughts and concepts.
0 – 2 points |
APA format
(2 pts) |
APA format correct with no more than 1-2 minor errors.
2 points |
3-5 errors in APA format and/or 1-2 citations are missing.
1 point |
APA formatting contains multiple errors and/or several citations are missing.
0 points |
Total Points Possible = 100 points |
Course Reflection Guidelines.docx 4/12/21 |
2 |
PSY640 CHECKLIST FOR EVALUATING TESTS
Test Name and Versions |
|
Assessment One |
Assessment Two |
|
|
Purpose(s) for Administering the Tests |
|
Assessment One |
Assessment Two |
|
|
Characteristic(s) to be Measured by the Tests (skill, ability, personality trait) |
|
Assessment One |
Assessment Two |
|
|
Target Population (education, experience level, other background) |
|
Assessment One |
Assessment Two |
|
|
Test Characteristics |
||
|
Assessment One |
Assessment Two |
1. Type (paper-and-pencil or computer): Alternate forms available: |
|
|
1. Scoring method (computer or manually): |
|
|
1. Technical considerations: a) Reliability: r = b) Validity: r = c) Reference/norm group: d) Test fairness evidence: e) Adverse impact evidence: f) Applicability (indicate any special groups): |
|
|
1. Administration considerations: |
|
|
1. Administration time: |
|
|
1. Materials needed (include start-up, operational, and scoring costs): |
|
|
1. Facilities needed: |
|
|
1. Staffing requirements: |
|
|
1. Training requirements: |
|
|
1. Other considerations (consider clarity, comprehensiveness, and utility): |
|
|
1. Test manual information: |
|
|
1. Supporting documents available from the publisher: |
|
|
1. Publisher assistance: |
|
|
1. Independent reviews: |
|
|
Overall Evaluation (One to two sentences providing your conclusions about the test you evaluated) |
|
Assessment One |
Assessment Two |
Name of Test:
|
Name of Test: |
References
List references in APA format as outlined by the Ashford Writing Center.
Applications In Personality Testing
Computer-generated MMPI-3 case description In the case of Mr. J, the computerized report displayed various results. High T response
was generated for various tests. Infrequent responses were highly reported whereas the response bias scale T score is one the second-highest level. The highest T score level was obtained for Emotional internalizing dysfunction whereas the lowest T-score was generated for hypomanic activation. The result generated showed 4 unscorable reports which indicated that the test was repeated many times. Infrequent responses have 35 items on them. No underreporting was done by the client. Substantive scale interpretation showed no somatic dysfunction. Response done measure through the scale indicates a high level of distress and significant demoralization. Reports also indicate a high level of negative emotionality. And these also indicate high levels of emotional dysfunctionality in the client due to which he is unable to get out of stress and despair. There are no indications reported regarding thoughts and behavioral dysfunction. Interpersonal functioning scales indicate that he is lacking positivity in his life and he is more embarrassed and nervous in any kind of situation. He has become socially awkward and feels distant from every person out there. He is more of an introverted kind of person. The diagnostic considerations report that he has an emotional internalizing disorder which includes major depression, generalized anxiety, and excessive worrying disorders. Other than that the client is also facing personality disorders due to which he is becoming more negative regarding every situation of life. He is also having several anhedonia-related disorders. The client's report also indicates that he is suffering from interpersonal disorders as well. He is facing social anxiety and an increased level of stress as well. A various number of unscorable tests were also obtained like (DOM, AGRR, FBS, VRIN, DSF, etc.). Critical responses were evaluated regarding suicidal death idealization up to 72%, helplessness and hopelessness scored up to 86%, demoralization up to 80%, inefficacy till 77%, stress 68%, negative emotions 8%, shyness 69% and worry 65%.
Ms. S psychology report Ms. S was serving in US Army and was having long-term anxiety and depression
disorder. Her disorders got better as soon as she started medication. There were good effects of medication, especially in the depressive situation. But she started having acute symptoms of distress and depression from the few days again as one of his platoon mates committed suicide. There was no such aspect observed in the familial history of the client. Various tests were performed and these tests include, a cognitive ability which indicated that the patient has a low average range for cognitive ability scoring. There were certain weaknesses present which was responsible for low-level estimation of functioning. She was a good average in several of the tests like reading, sentence comprehension, Nelson Denny reading test. The score was improved from 37 to 47%. Overall the scoring rate was as low as it was thought to be because she was well educated. Her performance was reduced on WAIS-IV and other weaknesses were observed in calculations. In some of the areas, her scoring was good which included the Ruff 2 & 7 Selective Attention Test. Her language was fully fluent and there was no impairment observed. All of her expressions and language were correct. There were no signs of visuospatial abilities observed and everything regarding them was completely normal. Ms. S through her results showed no
problem with her memory retention but there was a small impairment seen in her attention- demanding list. As far as her mood and personality were concerned she had a valid profile but at this time she was going through extremely distressful conditions. These conditions were not strong enough to name them depressive disorder but still, they had a quite huge impact on her. There was no self-harm condition observed. Individual therapy will be recommended to the client in which treatment of her anxiety disorder will be done. She will be given different tasks that will help her distract herself from various things that are going on in her mind. She will be allowed to use the calculator as well to see her shortcomings and in this way, her mind will be diverted from the stressful condition and anxiety she is feeling.
Psychological evaluations
As the psychological evaluations are concerned both of these teats have their importance and generated their kinds of results which were accurate in both of the scenarios. Both of them met the APA standard and a good level of professionalism as well. Both of the tests provided their way of assessment of the client's psychological situation and both of them were right in their diagnosis. MMPI-3 test was used in the analysis of both clients and it showed the incredible results of their mental health situation. The psychometric methodologies that were applied during Mr. J's session were substantive scale interpretation, externalizing and interpersonal scale, and cognitive dysfunction scales whereas in the case of Ms. S the methodologies applied were cognitive ability testing and other reading and writing tests and methods. Two additional tests of personality and emotional functioning are, thematic appreciation test (TAT) and the rotter incomplete sentence test. These both can be used in both cases for the critical analysis of the mental state of clients.in rotter tense, we can easily analyze the heart desire, wishes, and emotions of the client. Their fears and attitudes can be known as well and can be evaluated easily. TAT test cards can also help in the evaluation in which the photos are shown to the client where they tell the cards according to their mental state, which helps in the analysis as well.
References
Gregory, R. J. (2014). Psychological testing: History, principles, and applications (7th ed.). Boston, MA: Pearson.Chapter 8: Origins of Personality Testing. Chapter 9: Assessment of Normality and Human Strengths
Ben-Porath, Y. S., & Tellegen, A. (2020). MMPI-3 Case Description Mr. J – Interpretive Report [PDF]. https://www.pearsonassessments.com/content/dam/school/global/clinical/us/ assets/mmpi-3/mmpi-3-sample-interpretive-report.pdf
U.S. Department of Labor Employment and Training Administration. (2006). Testing and assessment: A guide to good practices for workforce investment professionals [PDF]. Retrieved from http://wdr.doleta.gov/directives/attach/TEN/ten2007/TEN21-07a1.pdf
Kennedy, N., & Harper Y. (2020). PSY640 Week four psychological assessment report [PDF]. College of Health and Human Services, University of Arizona Global Campus, San Diego, CA.
CHAPTER 11 Industrial, Occupational, and Career Assessment
TOPIC 11A Industrial and Organizational Assessment 11.1 The Role of Testing in Personnel Selection 11.2 Autobiographical Data 11.3 The Employment Interview 11.4 Cognitive Ability Tests 11.5 Personality Tests 11.6 Paper-and-Pencil Integrity Tests 11.7 Work Sample and Situational Exercises 11.8 Appraisal of Work Performance 11.9 Approaches to Performance Appraisal 11.10 Sources of Error in Performance Appraisal
In this chapter we explore the specialized applications of testing within two distinctive environments—occupational settings and
vocational settings. Although disparate in many respects, these two fields of assessment share essential features. For example, legal guidelines exert a powerful and constraining influence upon the practice of testing in both arenas. Moreover, issues of empirical validation of methods are especially pertinent in occupational and areas of practice. In Topic 11A, Industrial and Organizational Assessment, we review the role of psychological tests in making decisions about personnel such as hiring, placement, promotion, and evaluation. In Topic 11B, Assessment for Career Development in a Global Economy, we analyze the unique challenges encountered by vocational psychologists who provide career guidance and assessment. Of course, relevant tests are surveyed and catalogued throughout. But more important, we focus upon the special issues and challenges encountered within these distinctive milieus. Industrial and organizational psychology (I/O psychology) is the subspecialty of psychology that deals with behavior in work situations (Borman, Ilgen, Klimoski, & Weiner, 2003). In
its broadest sense, I/O psychology includes diverse applications in business, advertising, and the military. For example, corporations typically consult I/O psychologists to help design and evaluate hiring procedures; businesses may ask I/O psychologists to appraise the effectiveness of advertising; and military leaders rely heavily upon I/O psychologists in the testing and placement of recruits. Psychological testing in the service of decision making about personnel is, thus, a prominent focus of this profession. Of course, specialists in I/O psychology possess broad skills and often handle many corporate responsibilities not previously mentioned. Nonetheless, there is no denying the centrality of assessment to their profession. We begin our review of assessment in the occupational arena by surveying the role of testing in personnel selection. This is followed by a discussion of ways that psychological measurement is used in the appraisal of work performance.
11.1 THE ROLE OF TESTING IN PERSONNEL SELECTION Complexities of Personnel Selection Based upon the assumption that psychological tests and assessments can provide valuable information about potential job performance, many businesses, corporations, and military settings have used test scores and assessment results for personnel selection. As Guion (1998) has noted, I/O research on personnel selection has emphasized criterion-related validity as opposed to content or construct validity. These other approaches to validity are certainly relevant but usually take a back seat to criterion- related validity, which preaches that current assessment results must predict the future criterion of job performance. From the standpoint of criterion-related validity, the logic of personnel selection is seductively simple. Whether in a large corporation or a small business, those who select employees should use tests or assessments that have documented, strong correlations with the
criterion of job performance, and then hire the individuals who obtain the highest test scores or show the strongest assessment results. What could be simpler than that? Unfortunately, the real-world application of employment selection procedures is fraught with psychometric complexities and legal pitfalls. The psychometric intricacies arise, in large measure, from the fact that job behavior is rarely simple, unidimensional behavior. There are some exceptions (such as assembly-line production) but the general rule in our postindustrial society is that job behavior is complex, multidimensional behavior. Even jobs that seem simple may be highly complex. For example, consider what is required for effective performance in the delivery of the U.S. mail. The individual who delivers your mail six days a week must do more than merely place it in your mailbox. He or she must accurately sort mail on the run, interpret and enforce government regulations about package size, manage pesky and even dangerous animals, recognize and avoid physical dangers, and
exercise effective interpersonal skills in dealing with the public, to cite just a few of the complexities of this position. Personnel selection is, therefore, a fuzzy, conditional, and uncertain task. Guion (1991) has highlighted the difficulty in predicting complex behavior from simple tests. For one thing, complex behavior is, in part, a function of the situation. This means that even an optimal selection approach may not be valid for all candidates. Quite clearly, personnel selection is not a simple matter of administering tests and consulting cutoff scores. We must also acknowledge the profound impact of legal and regulatory edicts upon I/O testing practices. Given that such practices may have weighty consequences—determining who is hired or promoted, for example—it is not surprising to learn that I/O testing practices are rigorously constrained by legal precedents and regulatory mandates. These topics are reviewed in Topic 12A, Psychological Testing and the Law.
Approaches to Personnel Selection Acknowledging that the interview is a widely used form of personnel assessment, it is safe to conclude that psychological assessment is almost a universal practice in hiring decisions. Even by a narrow definition that includes only paper-and-pencil measures, at least two-thirds of the companies in the United States engage in personnel testing (Schmitt & Robertson, 1990). For purposes of personnel selection, the I/O psychologist may recommend one or more of the following: • Autobiographical data • Employment interview • Cognitive ability tests • Personality, temperament, and motivation
tests • Paper-and-pencil integrity tests • Sensory, physical, and dexterity tests • Work sample and situational tests
We turn now to a brief survey of typical tests and assessment approaches within each of these
categories. We close this topic with a discussion of legal issues in personnel testing. 11.2 AUTOBIOGRAPHICAL DATA According to Owens (1976), application forms that request personal and work history as well as demographic data such as age and marital status have been used in industry since at least 1894. Objective or scorable autobiographical data— sometimes called biodata—are typically secured by means of a structured form variously referred to as a biographical information blank, biographical data form, application blank, interview guide, individual background survey, or similar device. Although the lay public may not recognize these devices as true tests with predictive power, I/O psychologists have known for some time that biodata furnish an exceptionally powerful basis for the prediction of employee performance (Cascio, 1976; Ghiselli, 1966; Hunter & Hunter, 1984). An important milestone in the biodata approach is the publication of the Biodata Handbook, a
thorough survey of the use of biographical information in selection and the prediction of performance (Stokes, Mumford, & Owens, 1994). The rationale for the biodata approach is that future work-related behavior can be predicted from past choices and accomplishments. Biodata have predictive power because certain character traits that are essential for success also are stable and enduring. The consistently ambitious youth with accolades and accomplishments in high school is likely to continue this pattern into adulthood. Thus, the job applicant who served as editor of the high school newspaper—and who answers a biodata item to this effect—is probably a better candidate for corporate management than the applicant who reports no extracurricular activities on a biodata form. The Nature of Biodata Biodata items usually call for “factual” data; however, items that tap attitudes, feelings, and value judgments are sometimes included.
Except for demographic data such as age and marital status, biodata items always refer to past accomplishments and events. Some examples of biodata items are listed in Table 11.1. Once biodata are collected, the I/O psychologist must devise a means for predicting job performance from this information. The most common strategy is a form of empirical keying not unlike that used in personality testing. From a large sample of workers who are already hired, the I/O psychologist designates a successful group and an unsuccessful group, based on performance, tenure, salary, or supervisor ratings. Individual biodata items are then contrasted for these two groups to determine which items most accurately discriminate between successful and unsuccessful workers. Items that are strongly discriminative are assigned large weights in the scoring scheme. New applicants who respond to items in the keyed direction, therefore, receive high scores on the biodata instrument and are predicted to succeed. Cross validation of the scoring scheme on a second sample of
successful and unsuccessful workers is a crucial step in guaranteeing the validity of the biodata selection method. Readers who wish to pursue the details of empirical scoring methods for biodata instruments should consult Murphy and Davidshofer (2004), Mount, Witt, and Barrick (2000), and Stokes and Cooper (2001). TABLE 11.1 Examples of Biodata Questions How long have you lived at your present address? What is your highest educational degree? How old were you when you obtained your first paying job? How many books (not work related) did you read last month? At what age did you get your driver’s license? In high school, did you hold a class office? How punctual are you in arriving at work? What job do you think you will hold in 10 years? How many hours do you watch television in a typical week? Have you ever been fired from a job? How many hours a week do you spend on
The Validity of Biodata The validity of biodata has been surveyed by several reviewers, with generally positive findings (Breaugh, 2009; Stokes et al., 1994; Stokes & Cooper, 2004). An early study by Cascio (1976) is typical of the findings. He used a very simple biodata instrument—a weighted combination of 10 application blank items—to predict turnover for female clerical personnel in a medium-sized insurance company. The cross- validated correlations between biodata score and length of tenure were .58 for minorities and .56 for nonminorities.1 Drakeley et al. (1988) compared biodata and cognitive ability tests as predictors of training success. Biodata scores possessed the same predictive validity as the cognitive tests. Furthermore, when added to the regression equation, the biodata information improved the predictive accuracy of the cognitive tests. In an extensive research survey, Reilly and Chao (1982) compared eight selection procedures as to validity and adverse impact on minorities.
The procedures were biodata, peer evaluation, interviews, self-assessments, reference checks, academic achievement, expert judgment, and projective techniques. Noting that properly standardized ability tests provide the fairest and most valid selection procedure, Reilly and Chao (1982) concluded that only biodata and peer evaluations had validities substantially equal to those of standardized tests. For example, in the prediction of sales productivity, the average validity coefficient of biodata was a very healthy .62. Certain cautions need to be mentioned with respect to biodata approaches in personnel selection. Employers may be prohibited by law from asking questions about age, race, sex, religion, and other personal issues—even when such biodata can be shown empirically to predict job performance. Also, even though the incidence of faking is very low, there is no doubt that shrewd respondents can falsify results in a favorable direction. For example, Schmitt and Kunce (2002) addressed the concern that some examinees might distort their answers to
biodata items in a socially desirable direction. These researchers compared the scores obtained when examinees were asked to elaborate their biodata responses versus when they were not. Requiring elaborated answers reduced the scores on biodata items; that is, it appears that respondents were more truthful when asked to provide corroborating details to their written responses. Recently, Levashina, Morgeson, and Campion (2012) proved the same point in a large scale, high-stakes selection project with 16,304 applicants for employment. Biodata constituted a significant portion of the selection procedure. The researchers used the response elaboration technique (RET), which obliges job applicants to provide written elaborations of their responses. Perhaps an example will help. A naked, unadorned biodata question might ask: • How many times in the last 12 months did
you develop novel solutions to a work problem in your area of responsibility?
Most likely, a higher number would indicate greater creativity and empirically predict
superior work productivity. The score on this item would be combined with others to produce an overall biodata score used in personnel selection. But notice that nothing prevents the respondent from exaggeration or outright lying. Now, consider the original question with the addition of response elaboration: • How many times in the last 12 months did
you develop novel solutions to a work problem in your area of responsibility?
• For each circumstance, please provide specific details as to the problem and your solution.
Levashina et al. (2012) found that using the RET technique produced more honest and realistic biodata scores. Further, for those items possessing the potential for external verification, responses were even more realistic. The researchers conclude that RET decreases faking because it increases accountability. As with any measurement instrument, biodata items will need periodic restandardization. Finally, a potential drawback to the biodata approach is that, by its nature, this method
captures the organizational status quo and might, therefore, squelch innovation. Becker and Colquitt (1992) discuss precautions in the development of biodata forms. The use of biodata in personnel selection appears to be on the rise. Some corporations rely on biodata almost to the exclusion of other approaches in screening applicants. The software giant Google is a case in point. In years past, the company used traditional methods such as hiring candidates from top schools who earned the best grades. But that tactic now is used rarely in industry. Instead, many corporations like Google are moving toward automated systems that collect biodata from the many thousands of applicants processed each year. Using online surveys, these companies ask applicants to provide personal details about accomplishments, attitudes, and behaviors as far back as high school. Questions can be quite detailed, such as whether the applicant has ever published a book, received a patent, or started a club. Formulas are then used to compute a score from 0 to 100, designed to predict the degree to
fit with corporate culture (Ottinger & Kurzon, 2007). The system works well for Google, which claims to have only a 4 percent turnover rate. There is little doubt, then, that purely objective biodata information can predict aspects of job performance with fair accuracy. However, employers are perhaps more likely to rely upon subjective information such as interview impressions when making decisions about hiring. We turn now to research on the validity of the employment interview in the selection process. 1The curious reader may wish to know which 10 biodata items could possess such predictive power. The items were age, marital status, children’s age, education, tenure on previous job, previous salary, friend or relative in company, location of residence, home ownership, and length of time at present address. Unfortunately, Cascio (1976) does not reveal the relative weights or direction of scoring for the items.
11.3 THE EMPLOYMENT INTERVIEW The employment interview is usually only one part of the evaluation process, but many
administrators regard it as the vital make-or- break component of hiring. It is not unusual for companies to interview from 5 to 20 individuals for each person hired! Considering the importance of the interview and its huge costs to industry and the professions, it is not surprising to learn that thousands of studies address the reliability and validity of the interview. We can only highlight a few trends here; more detailed reviews can be found in Conway, Jako, and Goodman (1995), Huffcutt (2007), Guion (1998), and Schmidt and Zimmerman (2004). Early studies of interview reliability were quite sobering. In various studies and reviews, reliability was typically assessed by correlating evaluations of different interviewers who had access to the same job candidates (Wagner, 1949; Ulrich & Trumbo, 1965). The interrater reliability from dozens of these early studies was typically in the mid-.50s, much too low to provide accurate assessments of job candidates. This research also revealed that interviewers were prone to halo bias and other distorting influences upon their perceptions of candidates.
Halo bias—discussed in the next topic—is the tendency to rate a candidate high or low on all dimensions because of a global impression. Later, researchers discovered that interview reliability could be increased substantially if the interview was jointly conducted by a panel instead of a single interviewer (Landy, 1996). In addition, structured interviews in which each candidate was asked the same questions by each interviewer also proved to be much more reliable than unstructured interviews (Borman, Hanson, & Hedge, 1997; Campion, Pursell, & Brown, 1988). In these studies, reliabilities in the .70s and higher were found. Research on validity of the interview has followed the same evolutionary course noted for reliability: Early research that examined unstructured interviews was quite pessimistic, while later research using structured approaches produced more promising findings. In these studies, interview validity was typically assessed by correlating interview judgments with some measure of on-the-job performance. Early studies of interview validity yielded
almost uniformly dismal results, with typical validity coefficients hovering in the mid-.20s (Arvey & Campion, 1982). Mindful that interviews are seldom used in isolation, early researchers also investigated incremental validity, which is the potential increase in validity when the interview is used in conjunction with other information. These studies were predicated on the optimistic assumption that the interview would contribute positively to candidate evaluation when used alongside objective test scores and background data. Unfortunately, the initial findings were almost entirely unsupportive (Landy, 1996). In some instances, attempts to prove incremental validity of the interview demonstrated just the opposite, what might be called decremental validity. For example, Kelly and Fiske (1951) established that interview information actually decreased the validity of graduate student evaluations. In this early and classic study, the task was to predict the academic performance of more than 500 graduate students in psychology. Various
combinations of credentials (a form of biodata), objective test scores, and interview were used as the basis for clinical predictions of academic performance. The validity coefficients are reported in Table 11.2. The reader will notice that credentials alone provided a much better basis for prediction than credentials plus a one- hour interview. The best predictions were based upon credentials and objective test scores; adding a two-hour interview to this information actually decreased the accuracy of predictions. These findings highlighted the superiority of actuarial prediction (based on empirically derived formulas) over clinical prediction (based on subjective impressions). We pursue the actuarial versus clinical debate in the last chapter of this text. Studies using carefully structured interviews, including situational interviews, provide a more positive picture of interview validity (Borman, Hanson, & Hedge, 1997; Maurer & Fay, 1988; Schmitt & Robertson, 1990). When the findings are corrected for restriction of range and unreliability of job performance ratings, the
mean validity coefficient for structured interviews turns out to be an impressive .63 (Wiesner & Cronshaw, 1988). A meta-analysis by Conway, Jako, and Goodman (1995) concluded that the upper limit for the validity coefficient of structured interviews was .67, whereas for unstructured interviews the validity coefficient was only .34. Additional reasons for preferring structured interviews include their legal defensibility in the event of litigation (Williamson, Campion, Malo, and others, 1997) and, surprisingly, their minimal bias across different racial groups of applicants (Huffcutt & Roth, 1998). TABLE 11.2 Validity Coefficients for Ratings Based on Various Combinations of Information Basis for Rating Correlation with Credentials alone 0.26 Credentials and one-hour interview
0.13 Credentials and objective test scores
0.36 Credentials, test scores, and two-hour interview
0.32
Source: Based on data in Kelly, E. L., & Fiske, D. W. (1951). The prediction of performance in clinical psychology. Ann Arbor: University of Michigan Press. In order to reach acceptable levels of reliability and validity, structured interviews must be designed with painstaking care. Consider the protocol used by Motowidlo et al. (1992) in their research on structured interviews for management and marketing positions in eight telecommunications companies. Their interview format was based upon a careful analysis of critical incidents in marketing and management. Prospective employees were asked a set of standard questions about how they had handled past situations similar to these critical incidents. Interviewers were trained to ask discretionary probing questions for details about how the applicants handled these situations. Throughout, the interviewers took copious notes. Applicants were then rated on scales anchored with behavioral illustrations. Finally, these ratings were combined to yield a total interview score used in selection decisions.
In summary, under carefully designed conditions, the interview can provide a reliable and valid basis for personnel selection. However, as noted by Schmitt and Robertson (1990), the prerequisite conditions for interview validity are not always available. Guion (1998) has expressed the same point: A large body of research on interviewing has, in
my opinion, given too little practical information about how to structure an interview, how to conduct it, and how to use it as an assessment device. I think I know from the research that (a) interviews can be valid, (b) for validity they require structuring and standardization, (c) that structure, like many other things, can be carried too far, (d) that without carefully planned structure (and maybe even with it) interviewers talk too much, and (e) that the interviews made routinely in nearly every organization could be vastly improved if interviewers were aware of and used these conclusions. There is more to be learned and applied. (p. 624)
The essential problem is that each interviewer may evaluate only a small number of applicants, so that standardization of interviewer ratings is not always realistic. While the interview is potentially valid as a selection technique, in its common, unstructured application there is probably substantial reason for concern. Why are interviews used? If the typical, unstructured interview is so unreliable and ineffectual a basis for job candidate evaluation, why do administrators continue to value interviews so highly? In their review of the employment interview, Arvey and Campion (1982) outline several reasons for the persistence of the interview, including practical considerations such as the need to sell the candidate on the job, and social reasons such as the susceptibility of interviewers to the illusion of personal validity. Others have emphasized the importance of the interview for assessing a good fit between applicant and organization (Adams, Elacqua, & Colarelli, 1994; Latham & Skarlicki, 1995).
It is difficult to imagine that most employers would ever eliminate entirely the interview from the screening and selection process. After all, the interview does serve the simple human need of meeting the persons who might be hired. However, based on 50 years worth of research, it is evident that biodata and objective tests often provide a more powerful basis for candidate evaluation and selection than unstructured interviews. One interview component that has received recent attention is the impact of the handshake on subsequent ratings of job candidates. Stewart, Dustin, Barrick, and Darnold (2008) used simulated hiring interviews to investigate the commonly held conviction that a firm handshake bears a critical nonverbal influence on impressions formed during the employment interview. Briefly, 98 undergraduates underwent realistic job interviews during which their handshakes were surreptitiously rated on 5-point scales for grip strength, completeness, duration, and vigor; degree of eye contact during the handshake also was rated. Independent ratings
were completed at different times by five individuals involved in the process. Real human-resources professionals conducted the interviews and then offered simulated hiring recommendations. The professionals shook hands with the candidates but were not asked to provide handshake ratings because this would have cued them to the purposes of the study. This is the barest outline of this complex investigation. The big picture that emerged was that the quality of the handshake was positively related to hiring recommendations. Further, women benefited more than men from a strong handshake. The researchers conclude their study with these thoughts: The handshake is thought to have originated in
medieval Europe as a way for kings and knights to show that they did not intend to harm each other and possessed no concealed weapons (Hall & Hall, 1983). The results presented in this study show that this age-old social custom has an important place in modern business interactions. Although the handshake may
appear to be a business formality, it can indeed communicate critical information and influence interviewer assessments. (p. 1145)
Perhaps this study will provide an impetus for additional investigation of this important component of the job interview.
Barrick, Swider, and Stewart (2010) make the general case that initial impressions formed in the first few seconds or minutes of the employment interview significantly influence the final outcomes. They cite the social psychology literature to argue that initial impressions are nearly instinctual and based on evolutionary mechanisms that aid survival. Handshake, smile, grooming, manner of dress— the interviewer gauges these as favorable (or not) almost instantaneously. The purpose of their study was to examine whether these “fast and frugal” judgments formed in the first few seconds or minutes even before the “real” interview begins affect interview outcomes. Participants for their research were 189 undergraduate students in a program for
professional accountants. The students were pre- interviewed for just 2-3 minutes by trained graduate students for purposes of rapport building, before a more thorough structured mock interview was conducted. After the brief pre-interview, the graduate interviewers filled out a short rating scale on liking for the candidate, the candidate’s competence, and perceived “personal” similarity. The interviewers then conducted a full structured interview and filled out ratings. Weeks after these mock interviews, participants engaged in real interviews with four major accounting firms (Deloitte Touche Tohmatsu, Ernst & Young, KPMG, and PricewaterhouseCoopers) to determine whether they would receive an offer of an internship. Just over half of the students received an offer. Candidates who made better first impressions during the initial pre-interview (that lasted just 2-3 minutes) received more internship offers (r = .22) and higher interviewer ratings (r = .42). In sum, initial impressions in the employment interview do matter.
11.4 COGNITIVE ABILITY TESTS Cognitive ability can refer either to a general construct akin to intelligence or to a variety of specific constructs such as verbal skills, numerical ability, spatial perception, or perceptual speed (Kline, 1999). Tests of general cognitive ability and measures of specific cognitive skills have many applications in personnel selection, evaluation, and screening. Such tests are quick, inexpensive, and easy to interpret. A vast body of empirical research offers strong support for the validity of standardized cognitive ability tests in personnel selection. For example, Bertua, Anderson, and Salgado (2005) conducted a meta-analysis of 283 independent employee samples in the United Kingdom. They found that general mental ability as well as specific ability tests (verbal, numerical, perceptual, and spatial) are valid predictors of job performance and training success, with validity coefficients in the magnitude of .5 to .6.
Surveying a large number of studies and employment settings, Kuncel and Hezlett (2010) summarized correlations between cognitive ability and seven measures of work performance as follows:
Beyond a doubt, there is merit in the use of cognitive ability tests for personnel selection. Even so, a significant concern with the use of cognitive ability tests for personnel selection is that these instruments may result in an adverse impact on minority groups. Adverse impact is a legal term (discussed later in this chapter) referring to the disproportionate selection of white candidates over minority candidates. Most authorities in personnel psychology recognize that cognitive tests play an essential role in applicant selection; nonetheless, these experts
Job performance, high complexity: 0.58 Job performance, medium complexity: 0.52 Job performance, low complexity: 0.40 Training success, civilian: 0.55 Training success, military: 0.62 Objective leader effectiveness: 0.33 Creativity: 0.37
also affirm that cognitive tests provide maximum benefit (and minimum adverse impact) when combined with other approaches such as biodata. Selection decisions never should be made exclusively on the basis of cognitive test results (Robertson & Smith, 2001). An ongoing debate within I/O psychology is whether employment testing is best accomplished with highly specific ability tests or with measures of general cognitive ability. The weight of the evidence seems to support the conclusion that a general factor of intelligence (the so-called g factor) is usually a better predictor of training and job success than are scores on specific cognitive measures—even when several specific cognitive measures are used in combination. Of course, this conclusion runs counter to common sense and anecdotal evidence. For example, Kline (1993) offers the following vignette: The point is that the g factors are important but
so also are these other factors. For example, high g is necessary to be a good
engineer and to be a good journalist. However for the former high spatial ability is also required, a factor which confers little advantage on a journalist. For her or him, however, high verbal ability is obviously useful.
Curiously, empirical research provides only mixed support for this position (Gottfredson, 1986; Larson & Wolfe, 1995; Ree, Earles, & Teachout, 1994). Although the topic continues to be debated, most studies support the primacy of g in personnel selection (Borman et al., 1997; Schmidt, 2002). Perhaps the reason that g usually works better than specific cognitive factors in predicting job performance is that most jobs are factorially complex in their requirements, stereotypes notwithstanding (Guion, 1998). For example, the successful engineer must explain his or her ideas to others and so needs verbal ability as well as spatial skills. Since measures of general cognitive ability tap many specific cognitive skills, a general test often predicts performance in
complex jobs as well as, or better than, measures of specific skills. Literally hundreds of cognitive ability tests are available for personnel selection, so it is not feasible to survey the entire range of instruments here. Instead, we will highlight three representative tests: one that measures general cognitive ability, a second that is germane to assessment of mechanical abilities, and a third that taps a highly specific facet of clerical work. The three instruments chosen for review—the Wonderlic Personnel Test-Revised, the Bennett Mechanical Comprehension Test, and the Minnesota Clerical Test—are merely exemplars of the hundreds of cognitive ability tests available for personnel selection. All three tests are often used in business settings and, therefore, worthy of specific mention. Representative cognitive ability tests encountered in personnel selection are listed in Table 11.3. Some classic viewpoints on cognitive ability testing for personnel selection are found in Ghiselli (1966), Hunter and Hunter (1984), and Reilly and Chao (1982). More
recent discussion of this issue is provided by Borman et al. (1997), Guion (1998), and Schmidt (2002). TABLE 11.3 Representative Cognitive Ability Tests Used in Personnel Selection
General Ability Tests Shipley Institute of Living Scale Wonderlic Personnel Test-Revised Wesman Personnel Classification Test Personnel Tests for Industry Multiple Aptitude Test Batteries General Aptitude Test Battery Armed Services Vocational Aptitude Battery Differential Aptitude Test Employee Aptitude Survey Mechanical Aptitude Tests Bennett Mechanical Comprehension Test Minnesota Spatial Relations Test Revised Minnesota Paper Form Board Test SRA Mechanical Aptitudes Motor Ability Tests Crawford Small Parts Dexterity Test Purdue Pegboard Hand-Tool Dexterity Test Stromberg Dexterity Test Clerical Tests Minnesota Clerical Test Clerical Abilities Battery General Clerical Test SRA Clerical Aptitudes
Note: SRA denotes Science Research Associates. These tests are reviewed in the Mental Measurements Yearbook series. Wonderlic Personnel Test-Revised Even though it is described as a personnel test, the Wonderlic Personnel Test-Revised (WPT-R) is really a group test of general mental ability (Hunter, 1989; Wonderlic, 1983). The revised version was released in 2007 and is now named the Wonderlic Contemporary Cognitive Ability Test. We refer to it as the WPT-R here. What makes this instrument somewhat of an institution in personnel testing is its format (50 multiple-choice items), its brevity (a 12-minute time limit), and its numerous parallel forms (16 at last count). Item types on the Wonderlic are quite varied and include vocabulary, sentence rearrangement, arithmetic problem solving, logical induction, and interpretation of proverbs. The following items capture the flavor of the Wonderlic: 1. REGRESS is the opposite of
1. ingest 2. advance
3. close 4. open
2. Two men buy a car which costs $550; X pays $50 more than Y. How much did X pay? 1. $500 2. $300 3. $400 4. $275
3. HEFT CLEFT—Do these words have 1. similar meaning 2. opposite meaning 3. neither similar nor opposite meaning
The reliability of the WPT-R is quite impressive, especially considering the brevity of the instrument. Internal consistency reliabilities typically reach .90, while alternative-form reliabilities usually exceed .90. Normative data are available on hundreds of thousands of adults and hundreds of occupations. Regarding validity, if the WPT-R is considered a brief test of general mental ability, the findings are quite positive (Dodrill & Warner, 1988). For example, Dodrill (1981) reports a correlation of .91
between scores on the original WPT and scores on the WAIS. This correlation is as high as that found between any two mainstream tests of general intelligence. Bell, Matthews, Lassister, and Leverett (2002) reported strong congruence between the WPT and the Kaufman Adolescent and Adult Intelligence Test in a sample of adults. Hawkins, Faraone, Pepple, Seidman, and Tsuang (1990) report a similar correlation (r = .92) between WPT and WAIS-R IQ for 18 chronically ill psychiatric patients. However, in their study, one subject was unable to manage the format of the WPT, suggesting that severe visuospatial impairment can invalidate the test. Another concern about the Wonderlic is that examinees whose native language is not English will be unfairly penalized on the test (Belcher, 1992). The Wonderlic is a speeded test. In fact, it has such a heavy reliance on speed that points are added for subjects aged 30 and older to compensate for the well-known decrement in speed that accompanies normal aging. However, no accommodation is made for nonnative English speakers who might also perform more
slowly. One solution to the various issues of fairness cited would be to provide norms for untimed performance on the Wonderlic. However, the publishers have resisted this suggestion. Bennett Mechanical Comprehension Test In many trades and occupations, the understanding of mechanical principles is a prerequisite to successful performance. Automotive mechanics, plumbers, mechanical engineers, trade school applicants, and persons in many other “hands-on” vocations need to comprehend basic mechanical principles in order to succeed in their fields. In these cases, a useful instrument for occupational testing is the Bennett Mechanical Comprehension Test (BMCT). This test consists of pictures about which the examinee must answer straightforward questions. The situations depicted emphasize basic mechanical principles that might be encountered in everyday life. For example, a series of belts and flywheels might
be depicted, and the examinee would be asked to discern the relative revolutions per minute of two flywheels. The test includes two equivalent forms (S and T). The BMCT has been widely used since World War II for military and civilian testing, so an extensive body of technical and validity data exist for this instrument. Split-half reliability coefficients range from the .80s to the low .90s. Comprehensive normative data are provided for several groups. Based on a huge body of earlier research, the concurrent and predictive validity of the BMCT appear to be well established (Wing, 1992). For example, in one study with 175 employees, the correlation between the BMCT and the DAT Mechanical Reasoning subtest was an impressive .80. An intriguing finding is that the test proved to be one of the best predictors of pilot success during World War II (Ghiselli, 1966). In spite of its psychometric excellence, the BMCT is in need of modernization. The test looks old and many items are dated. By contemporary standards, some BMCT items are
sexist or potentially offensive to minorities (Wing, 1992). The problem with dated and offensive test items is that they can subtly bias test scores. Modernization of the BMCT would be a straightforward project that could increase the acceptability of the test to women and minorities while simultaneously preserving its psychometric excellence. Minnesota Clerical Test The Minnesota Clerical Test (MCT), which purports to measure perceptual speed and accuracy relevant to clerical work, has remained essentially unchanged in format since its introduction in 1931, although the norms have undergone several revisions, most recently in 1979 (Andrew, Peterson, & Longstaff, 1979). The MCT is divided into two subtests: Number Comparison and Name Comparison. Each subtest consists of 100 identical and 100 dissimilar pairs of digit or letter combinations (Table 11.4). The dissimilar pairs generally differ in regard to only one digit or letter, so the comparison task is challenging. The examinee is
required to check only the identical pairs, which are randomly intermixed with dissimilar pairs. The score depends predominantly upon speed, although the examinee is penalized for incorrect items (errors are subtracted from the number of correct items). TABLE 11.4 Items Similar to Those Found on the Minnesota Clerical Test
The reliability of the MCT is acceptable, with reported stability coefficients in the range of .81 to .87 (Andrew, Peterson, & Longstaff, 1979). The manual also reports a wealth of validity data, including some findings that are not
Number Comparison 1. 3496482 _______ 3495482 2. 17439903 _______ 17439903 3. 84023971 _______ 84023971 4. 910386294 _______ 910368294 Name Comparison 1. New York Globe _______ New York Globe 2. Brownell Seed _______ Brownel Seed 3. John G. Smith _______ John G Smith 4. Daniel Gregory _______ Daniel Gregory
altogether flattering. In these studies, the MCT was correlated with measures of job performance, measures of training outcome, and scores from related tests. The job performance of directory assistants, clerks, clerk-typists, and bank tellers was correlated significantly but not robustly with scores on the MCT. The MCT is also highly correlated with other tests of clerical ability. Nonetheless, questions still remain about the validity and applicability of the MCT. Ryan (1985) notes that the manual lacks a discussion of the significant versus the nonsignificant validity studies. In addition, the MCT authors fail to provide detailed information concerning the specific attributes of the jobs, tests, and courses used as criterion measures in the reported validity studies. For this reason, it is difficult to surmise exactly what the MCT measures. Ryan (1985) complains that the 1979 norms are difficult to use because the MCT authors provide so little information on how the various norm groups were constituted. Thus, even though the revised MCT manual presents
new norms for 10 vocational categories, the test user may not be sure which norm group applies to his or her setting. Because of the marked differences in performance between the norm groups, the vagueness of definition poses a significant problem to potential users of this test. 11.5 PERSONALITY TESTS It is only in recent years, with the emergence of the “big five” approach to the measurement of personality and the development of strong measures of these five factors, that personality has proved to be a valid basis for employee selection, at least in some instances. In earlier times such as the 1950s into the 1990s, personality tests were used by many in a reckless manner for personnel selection: Personality inventories such as the MMPI were
used for many years for personnel selection —in fact, overused or misused. They were used indiscriminately to assess a candidate’s personality, even when there was no established relation between test
scores and job success. Soon personality inventories came under attack. (Muchinsky, 1990)
In effect, for many of these earlier uses of testing, a consultant psychologist or human resource manager would look at the personality test results of a candidate and implicitly (or explicitly) make an argument along these lines: “In my judgment people with test results like this are [or are not] a good fit for this kind of position.” Sadly, there was little or no empirical support for such imperious conclusions, which basically amounted to a version of “because I said so.” Certainly early research on personality and job performance was rather sobering for many personality scales and constructs. For example, Hough, Eaton, Dunnette, Kamp, and McCloy (1990) analyzed hundreds of published studies on the relationship between personality constructs and various job performance criteria. For these studies, they grouped the personality constructs into several categories (e.g., Extroversion, Affiliation, Adjustment,
Agreeableness, and Dependability) and then computed the average validity coefficient for criteria of job performance (e.g., involvement, proficiency, delinquency, and substance abuse). Most of the average correlations were indistinguishable from zero! For job proficiency as the outcome criterion, the strongest relationships were found for measures of Adjustment and Dependability, both of which revealed correlations of r = .13 with general ratings of job proficiency. Even though statistically significant (because of the large number of clients amassed in the hundreds of studies), correlations of this magnitude are essentially useless, accounting for less than 2 percent of the variance.2 Specific job criteria such as delinquency (e.g., neglect of work duties) and substance abuse were better predicted in specific instances. For example, measures of Adjustment correlated r = −.43 with delinquency, and measures of Dependability correlated r = −.28 with substance abuse. Of course, the negative correlations indicate an inverse relationship: higher scores on
Adjustment go along with lower levels of delinquency, and higher scores on Dependability indicate lower levels of substance abuse. Apparently, it is easier to predict specific job- related criteria than to predict general job proficiency. Beginning in the 1990s, a renewed optimism about the utility of personality tests in personnel selection began to emerge (Behling, 1998; Hurtz & Donovan, 2000). The reason for this change in perspective was the emergence of the Big Five framework for research on selection, and the development of robust measures of the five constructs confirmed by this approach such as the NEO Personality Inventory-Revised (Costa & McCrae, 1992). Evidence began to mount that personality—as conceptualized by the Big Five approach—possessed some utility for employee selection. The reader will recall from an earlier chapter that the five dimensions of this model are Neuroticism, Extraversion, Openness to Experience, Conscientiousness, and Agreeableness. Shuffling the first letters, the acronym OCEAN can be used to remember the
elements. In place of Neu-roticism (which pertains to the negative pole of this factor), some researchers use the term Emotional Stability (which describes the positive pole of the same factor) so as to achieve consistency of positive orientation among the five factors. A meta-analysis by Hurtz and Donovan (2000) solidified Big Five personality factors as important tools in predicting job performance. These researchers located 45 studies using suitable measures of Big Five personality factors as predictors of job performance. In total, their data set was based on more than eight thousand employees, providing stable and robust findings, even though not all dimensions were measured in all studies. The researchers conducted multiple analyses involving different occupational categories and diverse outcome measures such as task performance, job dedication, and interpersonal facilitation. We discuss here only the most general results, namely, the operational validity for the five factors in predicting overall job performance. Operational validity refers to the correlation
between personality measures and job performance, corrected for sampling error, range restriction, and unreliability of the criterion. Big Five factors and validity coefficients were as follows:
Overall, Conscientiousness is the big winner in their analysis, although for some specific occupational categories, other factors were valuable (e.g., Agreeableness paid off for Customer Service personnel). Hurtz and Donovan (2000) use caution and understatement to summarize the implications of their study: What degree of utility do these global Big Five
measures offer for predicting job performance? Overall, it appears that global measures of Conscientiousness can be expected to consistently add a small portion of explained variance in job performance across jobs and across
Conscientiousness 0.26 Neuroticism 0.13 Extraversion 0.15 Agreeableness 0.05 Openness to Experience 0.04
criterion dimension. In addition, for certain jobs and for certain criterion dimensions, certain other Big Five dimensions will likely add a very small but consistent degree of explained variance. (p. 876)
In sum, people who describe themselves as reliable, organized, and hard-working (i.e., high on Conscientiousness) appear to perform better at work than those with fewer of these qualities. For specific applications in personnel selection, certain tests are known to have greater validity than others. For example, the California Psychological Inventory (CPI) provides an accurate measure of managerial potential (Gough, 1984, 1987). Certain scales of the CPI predict overall performance of military academy students reasonably well (Blake, Potter, & Sliwak, 1993). The Inwald Personality Inventory is well validated as a preemployment screening test for law enforcement (Chibnall & Detrick, 2003; Inwald, 2008). The Minnesota Multiphasic Personality Inventory also bears mention as a selection tool for law enforcement (Selbom, Fischler, & Ben-Porath, 2007).
Finally, the Hogan Personality Inventory (HPI) is well validated for prediction of job performance in military, hospital, and corporate settings (Hogan, 2002). The HPI was based upon the Big Five theory of personality (see Topic 8A, Theories and the Measurement of Personality). This instrument has cross- validated criterion-related validities as high as .60 for some scales (Hogan, 1986; Hogan & Hogan, 1986). 2The strength of a correlation is indexed by squaring it, which provides the proportion of variance accounted for in one variable by knowing the value of the other variable. In this case, the square of .13 is .0169 which is 1.69 percent.
11.6 PAPER-AND-PENCIL INTEGRITY TESTS Several test publishers have introduced instruments designed to screen theft-prone individuals and other undesirable job candidates such as persons who are undependable or frequently absent from work (Cullen & Sackett, 2004; Wanek, 1999). We will focus on issues raised by these tests rather than detailing the merits or demerits of individual instruments.
Table 11.5 lists some of the more commonly used instruments. One problem with integrity tests is that their proprietary nature makes it difficult to scrutinize them in the same manner as traditional instruments. In most cases, scoring keys are available only to in-house psychologists, which makes independent research difficult. Nonetheless, a sizable body of research now exists on integrity tests, as discussed in the following section on validity. An integrity test evaluates attitudes and experiences relating to the honesty, dependability, trustworthiness, and pro-social behaviors of a respondent. Integrity tests typically consist of two sections. The first is a section dealing with attitudes toward theft and other forms of dishonesty such as beliefs about extent of employee theft, degree of condemnation of theft, endorsement of common rationalizations about theft, and perceived ease of theft. The second is a section dealing with overt admissions of theft and other illegal activities such as items stolen in the last year,
gambling, and drug use. The most widely researched tests of this type include the Personnel Selection Inventory, the Reid Report, and the Stanton Survey. The interested reader can find addresses for the publishers of these and related instruments through Internet search. TABLE 11.5 Commonly Used Integrity Tests Overt Integrity Tests Accutrac Evaluation System Compuscan Employee Integrity Index Orion Survey PEOPLE Survey Personnel Selection Inventory Phase II Profile Reid Report and Reid Survey Stanton Survey Personality-Based Integrity Tests Employment Productivity Index Hogan Personnel Selection Series Inwald Personality Inventory Personnel Decisions, Inc., Employment
Inventory Personnel Reaction Blank
Note: Publishers of these tests can be easily found by using Google or another internet search engine. Apparently, integrity tests can be easily faked and might, therefore, be of less value in screening dishonest applicants than other approaches such as background check. For example, Ryan and Sackett (1987) created a generic overt integrity test modeled upon existing instruments. The test contained 52 attitude and 11 admission items. In comparison to a contrast group asked to respond truthfully and another contrast group asked to respond as job applicants, subjects asked to “fake good” produced substantially superior scores (i.e., better attitudes and fewer theft admissions). Validity of Integrity Tests In a recent meta-analysis of 104 criterion-related validity studies, Van Iddekinge, Roth, Raymark, and Odle-Dusseau (2012) found that integrity tests were not particularly useful in predicting job performance, training performance, or work turnover (corrected rs of .15, .16, and .09, respectively). However, when counterproductive work behavior (CWB, e.g., theft, poor
attendance, unsafe behavior, property destruction) was the criterion, the corrected r was a healthy .32. The correlation was even higher, r = .42, when based on self-reports of CWB as opposed to other reports or employee records. Overall, these findings support the value of integrity testing in personnel selection. Ones et al. (1993) requested data on integrity tests from publishers, authors, and colleagues. These sources proved highly cooperative: The authors collected 665 validity coefficients based upon 25 integrity tests administered to more than half a million employees. Using the intricate procedures of meta-analysis, Ones et al. (1993) computed an average validity coefficient of .41 when integrity tests were used to predict supervisory ratings of job performance. Interestingly, integrity tests predicted global disruptive behaviors (theft, illegal activities, absenteeism, tardiness, drug abuse, dismissals for theft, and violence on the job) better than they predicted employee theft alone. The authors concluded with a mild endorsement of these instruments:
When we started our research on integrity tests, we, like many other industrial psychologists, were skeptical of integrity tests used in industry. Now, on the basis of analyses of a large database consisting of more than 600 validity coefficients, we conclude that integrity tests have substantial evidence of generalizable validity.
This conclusion is echoed in a series of ingenious studies by Cunningham, Wong, and Barbee (1994). Among other supportive findings, these researchers discovered that integrity test results were correlated with returning an overpayment—even when subjects were instructed to provide a positive impression on the integrity test. Other reviewers are more cautious in their conclusions. In commenting on reviews by the American Psychological Association and the Office of Technology Assessment, Camara and Schneider (1994) concluded that integrity tests do not measure up to expectations of experts in assessment, but that they are probably better
than hit-or-miss, un-standardized methods used by many employers to screen applicants. Several concerns remain about integrity tests. Publishers may release their instruments to unqualified users, which is a violation of ethical standards of the American Psychological Association. A second problem arises from the unknown base rate of theft and other undesirable behaviors, which makes it difficult to identify optimal cutting scores on integrity tests. If cutting scores are too stringent, honest job candidates will be disqualified unfairly. Conversely, too lenient a cutting score renders the testing pointless. A final concern is that situational factors may moderate the validity of these instruments. For example, how a test is portrayed to examinees may powerfully affect their responses and therefore skew the validity of the instrument. The debate about integrity tests juxtaposes the legitimate interests of business against the individual rights of workers. Certainly, businesses have a right not to hire thieves, drug addicts, and malcontents. But in pursuing this
goal, what is the ultimate cost to society of asking millions of job applicants about past behaviors involving drugs, alcohol, criminal behavior, and other highly personal matters? Hanson (1991) has asked rhetorically whether society is well served by the current balance of power—in which businesses can obtain proprietary information about who is seemingly worthy and who is not. It is not out of the question that Congress could enter the debate. In 1988, President Reagan signed into law the Employee Polygraph Protection Act, which effectively eliminated polygraph testing in industry. Perhaps in the years ahead we will see integrity testing sharply curtailed by an Employee Integrity Test Protection Act. Berry, Sackett, and Wiemann (2007) provide an excellent review of the current state of integrity testing. 11.7 WORK SAMPLE AND SITUATIONAL EXERCISES A work sample is a miniature replica of the job for which examinees have applied. Muchinsky
(2003) points out that the I/O psychologist’s goal in devising a work sample is “to take the content of a person’s job, shrink it down to a manageable time period, and let applicants demonstrate their ability in performing this replica of the job.” Guion (1998) has emphasized that work samples need not include every aspect of a job but should focus upon the more difficult elements that effectively discriminate strong from weak candidates. For example, a position as clerk-typist may also include making coffee and running errands for the boss. However, these are trivial tasks demanding so little skill that it would be pointless to include them in a work sample. A work sample should test important job domains, not the entire job universe. Campion (1972) devised an ingenious work sample for mechanics that illustrates the preceding point. Using the job analysis techniques discussed at the beginning of this topic, Campion determined that the essence of being a good mechanic was defined by successful use of tools, accuracy of work, and
overall mechanical ability. With the help of skilled mechanics, he devised a work sample that incorporated these job aspects through typical tasks such as installing pulleys and repairing a gearbox. Points were assigned to component behaviors for each task. Example items and their corresponding weights were as follows: Installing Pulleys and Belts Scoring
Weights1. Checks key before installing against: ____ shaft 2 ____ pulley 2 ____ neither 0 Disassembling and Repairing a Gear Box10. Removes old bearing with: ____press and driver 3 ____bearing puller 2 ____gear puller 1 ____other 0 Pressing a Bushing into Sprocket and Reaming to Fit a Shaft4. Checks internal diameter of bushing against shaft diameter:____visually 1 ____hole gauge and micrometers 3 ____Vernier calipers 2
Campion found that the performance of 34 male maintenance mechanics on the work sample measure was significantly and positively related to the supervisor’s evaluations of their work performance, with validity coefficients ranging from .42 to .66. A situational exercise is approximately the white-collar equivalent of a work sample. Situational exercises are largely used to select persons for managerial and professional positions. The main difference between a situational exercise and a work sample is that the former mirrors only part of the job, whereas the latter is a microcosm of the entire job (Muchinsky, 1990). In a situational exercise, the prospective employee is asked to perform under circumstances that are highly similar to the anticipated work environment. Measures of accomplishment can then be gathered as a basis for gauging likely productivity or other aspects of job effectiveness. The situational exercises with the highest validity show a close
____scale 1 ____does not check 0
resemblance with the criterion; that is, the best exercises are highly realistic (Asher & Sciarrino, 1974; Muchinsky, 2003). Work samples and situational exercises are based on the conventional wisdom that the best predictor of future performance in a specific domain is past performance in that same domain. Typically, a situational exercise requires the candidate to perform in a setting that is highly similar to the intended work environment. Thus, the resulting performance measures resemble those that make up the prospective job itself. Hundreds of work samples and situational exercises have been proposed over the years. For example, in an earlier review, Asher and Sciarrino (1974) identified 60 procedures, including the following: • Typing test for office personnel • Mechanical assembly test for loom fixers • Map-reading test for traffic control officers • Tool dexterity test for machinists and
riveters
• Headline, layout, and story organization test for magazine editors
• Oral fact-finding test for communication consultants
• Role-playing test for telephone salespersons • Business-letter-writing test for managers
A very effective situational exercise that we will discuss here is the in-basket technique, a procedure that simulates the work environment of an administrator. The In-Basket Test The classic paper on the in-basket test is the monograph by Frederiksen (1962). For this comprehensive study Frederiksen devised the Bureau of Business In-Basket Test, which consists of the letters, memoranda, records of telephone calls, and other documents that have collected in the in-basket of a newly hired executive officer of a business bureau. In this test, the candidate is instructed not to play a role, but to be himself.3 The candidate is not to say what he would do, he is to do it.
The letters, memoranda, phone calls, and interviews completed by him in this simulated job environment constitute the record of behavior that is scored according to both content and style of the responses. Response style refers to how a task was completed—courteously, by telephone, by involving a superior, through delegation to a subordinate, and so on. Content refers to what was done, including making plans, setting deadlines, seeking information; several quantitative indices were also computed, including number of items attempted and total words written. For some scoring criteria such as imaginativeness—the number of courses of action which seemed to be good ideas—expert judgment was required. Frederiksen (1962) administered his in-basket test to 335 subjects, including students, administrators, executives, and army officers. Scoring the test was a complex procedure that required the development of a 165-page manual. The odd-even reliability of the individual items varied considerably, but enough modestly reliable items emerged (rs of .70 and above) that
Frederiksen could conduct several factor analyses and also make meaningful group comparisons. When scores on the individual items were correlated with each other and then factor analyzed, the behavior of potential administrators could be described in terms of eight primary factors. When scores on these primary factors were themselves factor analyzed, three second-order factors emerged. These second-order factors describe administrative behavior in the most general terms possible. The first dimension is Preparing for Action, characterized by deferring final decisions until information and advice is obtained. The second dimension is simply Amount of Work, depicting the large individual differences in the sheer work output. The third major dimension is called Seeking Guidance, with high scorers appearing to be anxious and indecisive. These dimensions fit well with existing theory about administrator performance and therefore support the validity of Frederiksen’s task.
A number of salient attributes emerged when Frederiksen compared the subject groups on the scorable dimensions of the in-basket test. For example, the undergraduates stressed verbal productivity, the government administrators lacked concern with outsiders, the business executives were highly courteous, the army officers exhibited strong control over subordinates, and school principals lacked firm control. These group differences speak strongly to the construct validity of the in-basket test, since the findings are consistent with theoretical expectations about these subject groups. Early studies supported the predictive validity of in-basket tests. For example, Brass and Oldham (1976) demonstrated that performance on an in- basket test corresponded to on-the-job performance of supervisors if the appropriate in- basket scoring categories were used. Specifically, based on the in-basket test, supervisors who personally reward employees for good work, personally punish subordinates for poor work, set specific performance objectives, and enrich their subordinates’ jobs
are also rated by their superiors as being effective managers. The predictive power of these in-basket dimensions was significant, with a multiple correlation coefficient of .54 between predictors and criterion. Standardized in-basket tests can now be purchased for use by private organizations. Unfortunately, most of these tests are “in-house” instruments not available for general review. In spite of occasional cautionary reviews (e.g., Brannick et al., 1989; Schroffel, 2012), the in-basket technique is still highly regarded as a useful method of evaluating candidates for managerial positions. Assessment Centers An assessment center is not so much a place as a process (Highhouse & Nolan, 2012). Many corporations and military branches—as well as a few progressive governments—have dedicated special sites to the application of in-basket and other simulation exercises in the training and selection of managers. The purpose of an assessment center is to evaluate managerial potential by exposing candidates to multiple
simulation techniques, including group presentations, problem-solving exercises, group discussion exercises, interviews, and in-basket techniques. Results from traditional aptitude and personality tests also are considered in the overall evaluation. The various simulation exercises are observed and evaluated by successful senior managers who have been specially trained in techniques of observation and evaluation. Assessment centers are used in a variety of settings, including business and industry, government, and the military. There is no doubt that a properly designed assessment center can provide a valid evaluation of managerial potential. Follow-up research has demonstrated that the performance of candidates at an assessment center is strongly correlated with supervisor ratings of job performance (Gifford, 1991). A more difficult question to answer is whether assessment centers are cost- effective in comparison to traditional selection procedures. After all, funding an assessment center is very expensive. The key question is whether the assessment center approach to
selection boosts organizational productivity sufficiently to offset the expense of the selection process. Anecdotally, the answer would appear to be a resounding yes, since poor decisions from bad managers can be very expensive. However, there is little empirical information that addresses this issue. Goffin, Rothstein, and Johnston (1996) compared the validity of traditional personality testing (with the Personality Research Form; Jackson, 1984b) and the assessment center approach in the prediction of the managerial performance of 68 managers in a forestry products company. Both methods were equivalent in predicting performance, which would suggest that the assessment center approach is not worth the (very substantial) additional cost. However, when both methods were used in combination, personality testing provided significant incremental validity over that of the assessment center alone. Thus, personality testing and assessment center findings each contribute unique information helpful in predicting performance.
Putting a candidate through an assessment center is very expensive. Dayan, Fox, and Kasten (2008) speak to the cost of assessment center operations by arguing that an employment interview and cognitive ability test scores can be used to cull the best and the worst applicants so that only those in the middle need to undergo these expensive evaluations. Their study involved 423 Israeli police force candidates who underwent assessment center evaluations after meeting initial eligibility. The researchers concluded in retrospect that, with minimal loss of sensitivity and specificity, nearly 20 percent of this sample could have been excused from more extensive evaluation. These were individuals who, based on interview and cognitive test scores, were nearly sure to fail or nearly certain to succeed. 3We do not mean to promote a subtle sexism here, but in fact Frederiksen (1962) tested a predominantly (if not exclusively) male sample of students, administrators, executives, and army officers.
11.8 APPRAISAL OF WORK PERFORMANCE
The appraisal of work performance is crucial to the successful operation of any business or organization. In the absence of meaningful feedback, employees have no idea how to improve. In the absence of useful assessment, administrators have no idea how to manage personnel. It is difficult to imagine how a corporation, business, or organization could pursue an institutional mission without evaluating the performance of its employees in one manner or another. Industrial and organizational psychologists frequently help devise rating scales and other instruments used for performance appraisal (Landy & Farr, 1983). When done properly, employee evaluation rests upon a solid foundation of applied psychological measurement—hence, its inclusion as a major topic in this text. In addition to introducing essential issues in the measurement of work performance, we also touch briefly on the many legal issues that surround the selection and appraisal of personnel. We begin by discussing the context of performance appraisal.
The evaluation of work performance serves many organizational purposes. The short list includes promotions, transfers, layoffs, and the setting of salaries—all of which may hang in the balance of performance appraisal. The long list includes at least 20 common uses identified by Cleveland, Murphy, and Williams (1989). These applications of performance evaluation cluster around four major uses: comparing individuals in terms of their overall performance levels; identifying and using information about individual strengths and weaknesses; implementing and evaluating human resource systems in organizations; and documenting or justifying personnel decisions. Beyond a doubt, performance evaluation is essential to the maintenance of organizational effectiveness. As the reader will soon discover, performance evaluation is a perplexing problem for which the simple and obvious solutions are usually incorrect. In part, the task is difficult because the criteria for effective performance are seldom so straightforward as “dollar amount of widgets sold” (e.g., for a salesperson) or “percentage of
students passing a national test” (e.g., for a teacher). As much as we might prefer objective methods for assessing the effectiveness of employees, judgmental approaches are often the only practical choice for performance evaluation. The problems encountered in the implementation of performance evaluation are usually referred to collectively as the criterion problem—a designation that first appeared in the 1950s (e.g., Flanagan, 1956; Landy & Farr, 1983). The phrase criterion problem is meant to convey the difficulties involved in conceptualizing and measuring performance constructs, which are often complex, fuzzy, and multidimensional. For a thorough discussion of the criterion problem, the reader should consult comprehensive reviews by Austin and Villanova (1992) and Campbell, Gasser, and Oswald (1996). We touch upon some aspects of the criterion problem in the following review. 11.9 APPROACHES TO PERFORMANCE APPRAISAL
There are literally dozens of conceptually distinct approaches to the evaluation of work performance. In practice, these numerous approaches break down into four classes of information: performance measures such as productivity counts; personnel data such as rate of absenteeism; peer ratings and self- assessments; and supervisor evaluations such as rating scales. Rating scales completed by supervisors are by far the preferred method of performance appraisal, as discussed later. First, we mention the other approaches briefly. Performance Measures Performance measures include seemingly objective indices such as number of bricks laid for a mason, total profit for a salesperson, or percentage of students graduated for a teacher. Although production counts would seem to be the most objective and valid methods for criterion measurement, there are serious problems with this approach (Guion, 1965). The problems include the following:
• The rate of productivity may not be under the control of the worker. For example, the fast-food worker can only sell what people order, and the assembly-line worker can only proceed at the same pace as coworkers.
• Production counts are not applicable to most jobs. For example, relevant production units do not exist for a college professor, a judge, or a hotel clerk.
• An emphasis upon production counts may distort the quality of the output. For example, pharmacists in a mail-order drug emporium may fill prescriptions with the wrong medicine if their work is evaluated solely upon productivity.
Another problem is that production counts may be unreliable, especially over short periods of time. Finally, production counts may tap only a small proportion of job requirements, even when they appear to be the definitive criterion. For example, sales volume would appear to be the ideal criterion for most sales positions. Yet, a salesperson can boost sales by misrepresenting company products. Sales may be quite high for
several years—until the company is sued by unhappy customers. Productivity is certainly important in this example, but the corporation should also desire to assess interpersonal factors such as honesty in customer relations. Personnel Data: Absenteeism Personnel data such as rate of absenteeism provide another possible basis for performance evaluation. Certainly employers have good reason to keep tabs on absenteeism and to reduce it through appropriate incentives. Steers and Rhodes (1978) calculated that absenteeism costs about $25 billion a year in lost productivity! Little wonder that absenteeism is a seductive criterion measure that has been researched extensively (Harrison & Hulin, 1989). Unfortunately, absenteeism turns out to be a largely useless measure of work performance, except for the extreme cases of flagrant work truancy. A major problem is defining absenteeism. Landy and Farr (1983) list 28 categories of absenteeism, many of which are
uncorrelated with the others. Different kinds of absenteeism include scheduled versus unscheduled, authorized versus unauthorized, justified versus unjustified, contractual versus noncontractual, sickness versus nonsickness, medical versus personal, voluntary versus involuntary, explained versus unexplained, compensable versus noncompensable, certified illness versus casual illness, Monday/Friday absence versus midweek, and reported versus unreported. When is a worker truly absent from work? The criteria are very slippery. In addition, absenteeism turns out to be an atrociously unreliable variable. The test-retest correlations (absentee rates from two periods of identical length) are as low as .20, meaning that employees display highly variable rates of absenteeism from one time period to the next. A related problem with absenteeism is that workers tend to underreport it for themselves and overreport it for others (Harrison & Shaffer, 1994). Finally, for the vast majority of workers, absenteeism rates are quite low. In short, absenteeism is a poor method for assessing
worker performance, except for the small percentage of workers who are chronically truant. Peer Ratings and Self-Assessments Some researchers have proposed that peer ratings and self-assessments are highly valid and constitute an important complement to supervisor ratings. A substantial body of research pertains to this question, but the results are often confusing and contradictory. Nonetheless, it is possible to list several generalizations (Harris & Schaubroeck, 1988; Smither, 1994): • Peers give more lenient ratings than
supervisors. • The correlation between self-ratings and
supervisor ratings is minimal. • The correlation between peer ratings and
supervisor ratings is moderate. • Supervisors and subordinates have different
ideas about what is important in jobs. Overall, reviewers conclude that peer ratings and self-assessments may have limited
application for purposes such as personal development, but their validity is not yet sufficiently established to justify widespread use (Smither, 1994). Supervisor Rating Scales Rating scales are the most common measure of job performance (Landy & Farr, 1983; Muchinsky, 2003). These instruments vary from simple graphic forms to complex scales anchored to concrete behaviors. In general, supervisor rating scales reveal only fair reliability, with a mean interrater reliability coefficient of .52 across many different approaches and studies (Viswesvaran, Ones, & Schmidt, 1996). In spite of their weak reliability, supervisor ratings still rank as the most widely used approach. About three-quarters of all performance evaluations are based upon judgmental methods such as supervisor rating scales (Landy, 1985). The simplest rating scale is the graphic rating scale, introduced by Donald Paterson in 1922 (Landy & Farr, 1983). A graphic rating scale
consists of trait labels, brief definitions of those labels, and a continuum for the rating. As the reader will notice in Figure 11.1, several types of graphic rating scales have been used. The popularity of graphic rating scales is due, in part, to their simplicity. But this is also a central weakness because the dimension of work performance being evaluated may be vaguely defined. Dissatisfaction with graphic rating scales led to the development of many alternative approaches to performance appraisal, as discussed in this section. A critical incidents checklist is based upon actual episodes of desirable and undesirable on- the-job behavior (Flanagan, 1954). Typically, a checklist developer will ask employees to help construct the instrument by submitting specific examples of desirable and undesirable job behavior. For example, suppose that we intended to develop a checklist to appraise the performance of resident advisers (RAs) in a dormitory. Modeling a study by Aamodt, Keller, Crawford, and Kimbrough (1981), we might ask current dormitory RAs the following question:
Think of the best RA that you have ever known. Please describe in detail several incidents that reflect why this person was the best adviser. Please do the same for the worst RA you have ever known.
Based upon hundreds of nominated behaviors, checklist developers would then proceed to distill and codify these incidents into a smaller number of relevant behaviors, both desirable and undesirable. For example, the following items might qualify for the RA checklist: • ______ stays in dorm more than required • ______ breaks dormitory rules • ______ is fair about discipline • ______ plans special programs • ______ fails to discipline friends • ______ is often unfriendly • ______ shows concern about residents • ______ comes across as authoritarian
FIGURE 11.1 Examples of Graphic Rating Scales Of course, the full checklist would be much longer than the preceding. The RA supervisor would complete this instrument as a basis for performance appraisal. If needed, an overall
summary score can be derived from an appropriate weighting of individual items. Another form of criterion-referenced judgmental measure is the behaviorally anchored rating scale (BARS). The classic work on BARS dates back to Smith and Kendall (1963). These authors proposed a complex developmental procedure for producing criterion-referenced judgments. The procedure uses a number of experts to identify and define performance dimensions, generate behavior examples, and scale the behaviors meaningfully. Overall, the procedure is quite complex, time-consuming, and expensive. A number of variations and improvements have been suggested. An advantage to BARS and other behavior-based scales is their strict adherence to EEOC (Equal Employment Opportunity Commission) guidelines discussed later in this chapter. BARS and related approaches focus upon behaviors as opposed to personality or attitudinal characteristics. A behaviorally anchored scale for performance of college professors in posting office hours is depicted in Figure 11.2. Of
course, the comprehensive evaluation of a sales manager would include additional scales for other aspects of work.
FIGURE 11.2 Behaviorally Anchored Rating Scale for Posting and Maintaining Office Hours Research on improving the accuracy of ratings with BARS is mixed. Some studies find fewer rating errors—especially a reduction in unwarranted leniency of evaluations—whereas other studies report no improvement with BARS compared to other evaluation methods (Murphy & Pardaffy, 1989). Overall, Muchinsky (2003) concludes that the BARS approach is not much better than graphic rating scales in reducing rating errors. Nonetheless, the scale development process of BARS may have
indirect benefits in that supervisors are compelled to pay close attention to the behavioral components of effective performance. A behavior observation scale (BOS) is a variation upon the BARS technique. The difference between the two is that the BOS approach uses a continuum from “almost never” to “almost always” to measure how often an employee performs the specific tasks on each behavioral dimension. As with the BARS technique, researchers question whether behavior observation scales are worth the extra effort (Guion, 1998). A forced-choice scale is designed to eliminate bias and subjectivity in supervisor ratings by forcing a choice between options that are equal in social desirability. In theory, this approach makes it impossible for the supervisor to slant ratings in a biased or subjective manner. We will use the pathbreaking research by Sisson (1948) to illustrate the features of this approach. He developed a scale to evaluate Army officers that consisted of tetrads of behavioral descriptors.
Each tetrad contained two positive items matched for social desirability and two negative items also matched for social desirability. The four items in each tetrad were topically related to a single performance dimension. Unknown to the supervisors who completed the rating scale, one of the two positive items was judged very descriptive of effective Army officers and the other judged less so. Likewise, one of the two negative items was judged more descriptive of ineffective Army officers and the other judged less so. Here is a sample tetrad (Borman, 1991):
Supervisors were asked to review the items in each tetrad and to check one item as most descriptive and one item as least descriptive of the officer being evaluated. A score of +1 was awarded for responding “most descriptive” to the positively keyed item (in this case, alternative B) or “least descriptive” to the
Most Descripti
Least DescriptiA. Cannot assume
responsibility ______ ______
B. Knows how and when to delegate authority
______ ______ C. Offers suggestions ______ ______ D. Changes ideas too easily
______ ______
negatively keyed item (in this case alternative A), whereas a score of −1 was awarded for responding “least descriptive” to the positively keyed item or “most descriptive” to the negatively keyed item. Responding to the nonkeyed items (alternatives C and D) as most or least descriptive earned a score of 0. Thus, each tetrad yielded a five-point continuum of scores: +2, +1, 0, −1, −2. The summary score used for performance appraisal consisted of the algebraic sum of the individual items. The forced-choice approach has never really caught on, due largely to the effort required in scale construction. This is unfortunate because the method does effectively reduce unwanted bias. Borman (1991) refers to this approach as a “bold initiative” that produces a relatively objective rating scale. 11.10 SOURCES OF ERROR IN PERFORMANCE APPRAISAL The most difficult problem in the assessment of job performance is the proper definition of appraisal criteria. If the supervisor is using a
poorly designed instrument that does not tap the appropriate dimensions of job behavior, then almost by definition the performance appraisal will be inaccurate, incomplete, and erroneous. Undoubtedly, the failure to identify appropriate criteria for acceptable and unacceptable performance is a major source of error in performance appraisal. But it is not the only source. Even when supervisors have access to excellent, well-designed measures of performance appraisal, various sorts of subtle errors can creep in. We discuss three such additional sources of rating error: halo effect, rater bias, and criterion contamination. Halo Effect The tendency to rate an employee high or low on all dimensions because of a global impression is called halo effect. Research on the halo effect can be traced back to the early part of this century (Thorndike, 1920). The most common halo effect is a positive halo effect. In this case, an employee receives a higher rating than deserved because the supervisor fails to be
objective when rating specific aspects of the employee’s behavior. A positive halo effect is usually based upon overgeneralization from one element of a worker’s behavior. For example, an employee with perfect attendance may receive higher-than-deserved evaluations on productivity and work quality—even though attendance is not directly related to these job dimensions. Smither (1998) lists the following approaches to control for halo effects: • Provide special training for raters • Supervise the supervisors during the rating • Practice simulations before doing the ratings • Keep a diary of information relevant to
appraisal • Provide supervisors with a short lecture on
halo effects Additional approaches to rater training are discussed by Goldstein (1991). An intriguing analysis of the nature and consequences of halo error can be found in Murphy, Jako, and Anhalt (1993). Contrary to the reigning prejudice against halo errors, these researchers conclude
that the halo effect does not necessarily detract from the accuracy of ratings. They point out that a presumed halo effect is often the by-product of true overlap on the dimensions being rated. The debate over halo effect is not likely to be resolved anytime soon (Arvey & Murphy, 1998). Rater Bias The potential sources of rater bias are so numerous that we can only mention a few prominent examples here. Leniency or severity errors occur when a supervisor tends to rate workers at the extremes of the scale. Leniency may reflect social dynamics, as when the supervisor wants to be liked by employees. Leniency is also caused by extraneous factors such as the attractiveness of the employee. Severity errors refer to the practice of rating all aspects of performance as deficient. In contrast, central tendency errors occur when the supervisor rates everyone as nearly average on all performance dimensions. Context errors occur when the rater evaluates an employee in
the context of other employees rather than based upon objective performance. For example, the presence of a workaholic salesperson with extremely high sales volume might cause the sales supervisor to rate other sales personnel lower than deserved. Recently, researchers have paid considerable attention to the possible biasing effects of whether a supervisor likes or dislikes a subordinate. Surprisingly, the trend of the findings is that supervisor affect (liking or disliking) toward specific employees does not introduce rating bias. In general, strong affect in either direction represents valid information about an employee. Thus, ratings of affect often correlate strongly with performance ratings, but this is because both are a consequence of how well or poorly the employee does the job (Ferris, Judge, Rowland, & Fitzgibbons, 1994; Varma, DeNisi, & Peters, 1996). Other forms of rater bias are discussed by Goldstein (1991) and Smither (1994). Criterion Contamination
Criterion contamination is said to exist when a criterion measure includes factors that are not demonstrably part of the job (Borman, 1991; Harvey, 1991). For example, if a performance measure includes appearance, this would most likely be a case of criterion contamination— unless appearance is relevant to job success. Likewise, evaluating an employee on “dealing with the public” is only appropriate if the job actually requires the employee to meet the public. Goldstein (1992) outlines three kinds of criterion contamination: 1. Opportunity bias occurs when workers have
different opportunities for success, as when one salesperson is assigned to a wealthy neighborhood and others must seek sales in isolated, rural areas.
2. Group characteristic bias is present when the characteristics of the group affect individual performance, as when workers in the same unit agree to limit their productivity to maintain positive social relations.
3. Knowledge of predictor bias occurs when a supervisor permits personal knowledge
about an employee to bias the appraisal, as when quality of the college attended by a new worker affects her evaluation.
Careful attention to job analysis as a basis for selection of appraisal criteria is the best way to reduce errors in performance appraisal. In addition, employers should follow certain guidelines in performance appraisal, as discussed in the following section. Guidelines for Performance Appraisal Performance appraisal is a formidable task. Not only must employers pay attention to the psychometric soundness of their approach, they must also design a practical system that meets organizational goals. For example, appraisal standards must be sufficiently difficult and detailed to ensure that organizational goals are accomplished. Another concern is that performance appraisal falls under the purview of Title VII of the Civil Rights Act of 1964. Hence, employers must develop fair systems that do not discriminate on the basis of race, sex, and other protected categories. To complicate matters,
these standards—soundness, practicality, legality—may conflict with one another. The practical approach may be neither psychometrically sound nor legal. Often, appraisal methods that show the best measurement characteristics (e.g., strong interrater reliability) will fail to assess the most important aspects of performance; that is, they are not practical. This is a familiar refrain within the measurement field. Too often, psychologists must choose between rigor and relevance, rarely achieving both at the same time. Finally, legal considerations must be considered when exploring the limits of performance appraisal. Smither (1998) has published guidelines for developing performance appraisal systems that we paraphrase here: • Base the performance appraisal upon a
careful job analysis • Develop specific, contamination-free criteria
for appraisal from the job analysis • Determine that the instrument used to rate
performance is appropriate for the appraisal situation
• Train raters to be accurate, fair, and legal in their use of the appraisal instrument
• Use performance evaluations at regular intervals of six months to a year
• Evaluate the performance appraisal system periodically to determine whether it is actually improving performance
The training of raters is an especially important guideline. An appraisal system that seems perfectly straightforward to the employer could easily be misunderstood by an untrained rater, resulting in biased evaluations. Borman (1991) notes that two kinds of rater training are effective: rater error training, in which the trainer seeks simply to alert raters to specific kinds of errors (e.g., halo effect); and frame-of- reference training, in which the trainer familiarizes the raters with the specific content of each performance dimension. Research indicates that these kinds of training improve the accuracy of ratings. Finally, we review an intriguing study conducted from an international perspective. Peretz and Fried (2012) remind us that cultural
norms influence the nature, acceptability, and impact of different approaches to performance appraisal. They surveyed performance appraisal practices in 21 nations, obtained ratings on cultural norms for each nation, and determined their joint impact on organizational absenteeism and turnover. Specifically, the researchers collected data on personnel practices from thousands of organizations in these mainly European countries. Next, they obtained ratings for each country on four cultural practices: power distance (acceptance versus rejection of inequality), future orientation (present versus future orientation), person value (individualism versus collectivism), and uncertainty avoidance (acceptance versus avoidance of uncertainty). Each cultural norm was rated 1 to 7 for each nation based on an independent global data base. Then, they examined the joint impact of personnel practices and cultural norms on absenteeism and turnover. Their study is complex and detailed, beyond the scope of fine- tuned analysis here. In sum, they found that congruence between societal norms and
personnel assessment methods tended to reduce turnover and/or absenteeism. One example is the use of the so-called 360-evaluation, in which performance appraisal is based on input from people at all levels who interact with the employee. This practice is more effective (leading to less absenteeism and turnover) in some cultures than others. Peretz and Fried (2012) found that personnel assessment systems with several sources of raters (e.g., supervisors, coworkers, and subordinates) were most acceptable to employees in companies located in societies with low power distance, high future orientation, and respect for individualism. In contrast, multiple sources of assessment were not well received by employees working in collectivistic societies. It appears the best practices in personnel assessment depend upon the cultural context. TOPIC 11B Assessment for Career Development in a Global Economy 11.11 Career Developments and the Functions of Work
11.12 Origins of Career Development Theories 11.13 Theory of Person-Environment Fit 11.14 Theory of Person-Environment Correspondence 11.15 Stage Theories of Career Development 11.16 Social Cognitive Approaches 11.17 O*NET in Career Development 11.18 Inventories for Career Assessment
Prior to the 1700s, agrarian economies dominated cultural and economic life in the Western world. Vocational opportunities for most people remained limited to farming, crafts, labor, and small businesses. The modern vision that individuals could pursue dozens or hundreds of careers likely did not exist for the masses who scrambled simply to survive (Zinn, 1995). With the advent of the first industrial revolution in the 1700s, including the invention of the steam engine and other labor saving devices, the need for human labor diminished rapidly. In parallel, the vocational world
expanded substantially, offering upward mobility to some of the working class and poor. Gradually, the concept of career identity emerged in the public consciousness. Career identity is now recognized as essential to personhood and vital to a sense of well-being. When we meet someone for the first time, our natural inclination is to ask, or at least to wonder, “What do you do for a living?” The values, political views, and personal qualities of the individual are important, too, but how the individual contributes to society is typically the first thing we want to know. An occupational title communicates an abundance of information, including personality characteristics, economic class, and social standing (Andersen & Vandehey, 2011). Work and career are so central to personal well- being that unemployment, especially when prolonged, consistently causes a wide range of physical, psychological, and spiritual maladies. These include: . . . economic hardship, loss of health insurance,
foreclosure, and mental health problems.
The mental health problems include depression and anxiety, feelings of hopelessness and shame, and familial tension and conflict (Jones & Barber, 2012, p. 18).
A meta-analysis of 104 empirical studies revealed that the negative impact of unemployment is buffered by the availability of coping resources (e.g., family and financial support) and, conversely, made worse by work- role centrality (e.g., the belief that work is central to one’s life and satisfaction) (McKee- Ryan, Song, Wanberg, & Kinicki, 2005). Except in a few totalitarian states where occupational access is rigidly controlled by the ruling elite, individuals usually have some degree of latitude in finding their own way to a vocation. They also possess some capacity to change occupations in their lifetimes. Even though the widely cited assertion that the average individual will switch careers seven times has no factual basis, nonetheless, career change likely is more common now than in years past (Bialik, 2010). Also, initial career
choice for the young adult remains a vexing issue for many, especially with the continual emergence of substantially new vocations. The advent of new vocations is driven by technological innovations and the aging of the population. A few examples of new careers include cloud computing expert, market research data analyst, and corporate listening officer (Forbes magazine, May 5, 2011). The need for flexibility in career development originates, in part, from the globalization of the world economies, spelled out in the provocative best seller, The World Is Flat (Friedman, 2009). Information technology is now instantly available to everyone, linking knowledge centers into a single worldwide network, creating a more level economic playing field, and requiring corporations to restructure as new opportunities emerge. One concrete example of the new, flat world: For the previous edition of this textbook, the editorial production and composition services were completed by the skilled and efficient employees of a dynamic company located in India. After a few phone
calls and email exchanges of PDF files with the author, the text was ready for printing in the United States in a matter of weeks. In summary, psychologists who provide career guidance will need new approaches to assessment that are sensitive to the need for transition planning in a rapidly changing and increasingly competitive global economy. But practitioners need to avoid the “Test and Tell” trap: Clients often come to career counseling
assuming an expert will administer some test that tells the client “the answer” as to what occupation is “the right one.” The client’s expectation for “test and tell” sets the stage for the client and the counselor to depend on a limited, structured approach (Andersen & Vandehey, 2011, p. 10).
The problem with this method is that the counselor will fail to discern the unique needs of the client in a developmentally sensitive context. Guidance will be far more effective if the practitioner slows the process down and provides the opportunity for mutual exploration.
In other words, career guidance is a tactic of assessment in the broader sense, not a limited method of testing in the narrow sense. Assessment for career development requires knowledge of theories of career development, sensitivity to issues of diversity, and understanding of information resources. Thus, before turning to a survey of suitable instruments, we begin with a brief review of prominent career development theories. We start with a simple but provocative question pursued by Blustein (2006), “What is work for?” 11.11 CAREER DEVELOPMENT AND THE FUNCTIONS OF WORK For some people, gainful employment provides more than just a means to pay for food and housing. Psychologists who provide assessment for career development need to keep in mind the multiple functions of work, reviewed here. Yet, it is also true that many people, perhaps the majority, do not have access to the educational
and employment opportunities that would allow them to develop a work vision or to realize a career dream. Since recorded time, humanity has been plagued
by various forms of structural barriers based on race, culture, immigration status, religion, gender, age, sexual orientation, and social class that have had a differential impact on individuals. Our belief is that counselors need to be fully cognizant of how these barriers affect clients so that they are able to provide maximally effective interventions that do not inadvertently blame the victims of social oppression (Blustein, Kenna, Gill, & DeVoy, 2008, p. 297).
It bears repeating that discrimination continues to obstruct career potential for minorities. A subtle racism on the part of employers and agencies often is the source. Many studies could be cited to buttress this point as a global issue. For reasons of space, we offer just two examples. A recent study from Great Britain confirms that ethnic minorities experience an
“ethnic penalty” with higher unemployment rates, greater concentrations in dead-end assembly line jobs, and lower earnings than Whites, even for the same job (Bell & Casebourne, 2008). Immigrants to Great Britain likewise face career barriers. When able to find work, it is typically in just a few industries such as catering, language translation, shop work, and clerical jobs. Professional employment was notably lacking, despite previous experience (Bloch, 2002). Unfortunately, most theories of career development do not acknowledge the profound challenges faced by low income individuals, minorities, and immigrants. The psychology-of- working viewpoint provided by Blustein and his collaborators is an exception. These researchers provide a meta-theoretical perspective that can be used alongside traditional models of career development. We begin with a summary of their model. According to Blustein and colleagues (2008), work can fulfill any or all of three sets of needs:
• Survival and Power: These are the foundational reasons that most people work, namely, to meet basic subsistence needs such as food, clothing, and shelter. In varying degrees, work also provides access to economic and social power. Specifically, those with financial resources are more likely to prevail and to get their way in the wider community. Money talks.
• Social Connection: Work is the place where many of our vital human connections are formed. Deep friendships are forged and sometimes maintained over a lifetime. The quality of these relationships has the potential to enhance performance when coworkers are positive and supportive, or to create great stress when colleagues are abrasive and conflict-prone.
• Self-Determination: For some individuals, work is also a means of self- actualization and personal fulfillment. Everyone is familiar with those fortunate individuals who love what they do and are privileged to be paid for it, too. But Blustein
et al. (2008) remind us that many workers do not have the opportunity to select a career that provides for creative and fulfilling self- expression.
In addition to discrimination, structural barriers often prevent career development among minorities. For example, African Americans may lack relevant social networks, lack public transit for employment, and lack savings needed to relocate for available work (Weller & Fields, 2011). Further, unemployment is itself a serious structural barrier. In 2011, unemployment among African Americans was about 16 percent, double that of Whites. These data do not include those who have quit looking for work, or who are chronically underemployed. Being out of work tends to become a vicious, self- perpetuating cycle, with the unemployed individual losing work skills with each passing month, further reducing employment prospects. 11.12 ORIGINS OF CAREER DEVELOPMENT THEORIES
Implicitly or explicitly, practitioners make use of a theoretical framework in their practice of assessment in career counseling. Thus, we provide a short review of essential viewpoints here. We begin with an historical note, acknowledging the seminal contributions of Frank Parsons, considered by many the founder of the field of career guidance. In 1909, he published Choosing a Vocation, a practical manual for providing career direction to young men and women. Parsons (1909) advocated making a career choice based on matching personal traits with job factors: In the wise choice of a vocation there are three
broad factors: (1) a clear understanding of yourself, your aptitudes, abilities, interests, ambitions, resources, limitations, and their causes; (2) a knowledge of the requirements and conditions of success, advantages and disadvantages, compensation, opportunities, and prospects in different lines of work; (3) true reasoning on the relations of these two groups of facts (p. 5).
Parsons provided a 116-item questionnaire to survey the accomplishments, interests, and aptitudes of the client. This was followed by a lengthy, penetrating interview designed to illuminate aspects of social presentation and personal character (e.g., “Do you smile naturally and easily?” “Is your handshake warm and cordial?” “Are you careful about voice modulation?” “Are you honest, truthful, and candid?” “Are you industrious, hard-working, and persistent?” “Do you welcome people of different creed or political faith?”). His manual also provided an extensive analysis of the qualities needed for success in dozens of vocations. Consultation with each client continued over a span of several weeks. The task of the counselor was to match the traits of the client with the requirements of specific lines of work. Effectively, this was an early, rudimentary form of the method advocated by John Holland and others, known as person- environment fit.
11.13 THEORY OF PERSON- ENVIRONMENT FIT Over 50 years ago, John Holland (1959) established the framework for a sophisticated theory of vocational choice that has engendered more research than any other approach in the field. From the beginning, he also constructed and validated assessment tools that embodied the practical application of his model, known as Person-Environment Fit. He proposed that personality traits/interests tend to cluster into a small number of vocationally relevant patterns, called types. For each personality type, there is also a corresponding work environment best suited to that type. According to Holland, there are six types: Realistic, Investigative, Artistic, Social, Enterprising, and Conventional. Each type corresponds to both a set of personality traits/interests and also to a set of environmental work demands. Figure 11.3 depicts this approach, sometimes known as the RIASEC model, in reference to the first letters of the six types. The types are idealizations that few
people (or work environments) fit completely. The RIASEC personality patterns are summarized in Table 11.6, and corresponding work environments are found in Table 11.7. Regarding the six personality types, it is rare that an individual is a “pure” representation of only one type. Instead, most individuals reveal a preferred type, but display some resemblance to a secondary and a tertiary type as well. For example, someone who was very strong on the Investigative dimension (likes to analyze) might reveal a secondary emphasis for the Social aspect (enjoys helping others), and a lesser emphasis on the Artistic type (reveals a creative element). Using the first letters of these three types in descending order of emphasis, we arrive at the Holland code for the individual, namely, ISA. We will say more about Holland codes when we discuss assessment tools such as the Self-Directed Search developed for this purpose. For now it will suffice to know that excellent tools exist for the empirically validated assessment of the six types.
Consistency and differentiation are two concepts important in the Holland approach. Referring to the hexagonal model depicted in Figure 11.3, adjacent personality types bear greater similarity to one another than types that are separated on the figure. For example, the Realistic and Conventional types (side by side) are somewhat similar, whereas the Realistic and Social types (across the hexagon) are quite different or inconsistent. Thus, a client whose Holland code was RCE (adjacent codes) would be considered more consistent than a client whose code was REA (separated codes). This is relevant to assessment and career guidance because work environments tend to possess consistency in regard to types. It is easier for clients to find person–environment fit when they possess consistency, too.
FIGURE 11.3 Holland’s Hexagonal Model of Personality Types and Occupational Themes TABLE 11.6 RIASEC Personality Types
Source: Based on Holland, J. L. (1985). Vocational Preference Inventory (VPI) manual—1985 edition. Odessa, FL: Psychological Assessment Resources. TABLE 11.7 RIASEC Work Environments
Realistic individuals are practical, conservative, and value tangible rewards. They like to work with tools, machines, and things. They usually avoid interaction with others. Investigative persons show a strong analytical bent. They value knowledge and like to explore, understand, and predict natural and social phenomena. They tend to avoid selling or persuading others. Artistic individuals are unconventional and enjoy the creative expression of ideas and emotions. They value musical, literary, or artistic endeavors. They avoid conformity to
Source: Based on Holland, J. L. (1985). Vocational Preference inventory (VPi) manual—1985 edition. Odessa, FL: Psychological Assessment Resources.
Realistic work environments require hands-on involvement, physical movement, mechanical skill, and technical competencies. Pragmatic problem solving is needed. Typical vocations include auto repair, cook, drywall installer, machinist, taxi driver, and umpire. Investigative settings require the use of abstract thinking and creative abilities. The focus is a rational approach and ideas, not people. Typical positions include architect, arson investigator, pharmacist, physician, psychologist, and software engineer. Artistic environments require the creative application of artistic forms. These settings demand prolonged work and place a premium on access to intuition and emotional life. Typical vocations include actor, composer, graphic designer, model, photographer, and reporter. Social environments involve an interest in caring for people and the ability to discern and influence their behavior. These work settings
Differentiation refers to the relative strength of the first, second, and third personality types of the Holland code. A client with strong differentiation will reveal a marked preference for his or her first category, and less interest in the second and third categories. A client with weak differentiation might demonstrate scores that are nearly tied on the top three categories of the Holland code. This could indicate a difficulty committing to one kind of work environment. Most work environments require some degree of differentiation. Hence, the undifferentiated client may struggle to find a satisfying work environment. Holland’s theoretical approach has been so influential that nearly every assessment tool in the field of career guidance makes reference to his six personality types. But the simple elegance of this approach is also a potential weakness. The assessment tools that embody the Holland model typically list suitable occupations and rule out nonmatching environments. Counselors and clients can
foreclose on further exploration. It is easy to fall into the “test and tell” trap. 11.14 THEORY OF PERSON- ENVIRONMENT CORRESPONDENCE The theory of Person-Environment Correspondence (PEC) evolved from the Theory of Work Adjustment (TWA). First envisioned in the 1950s, TWA arose as a basis for conducting research on the work adjustment of vocational rehabilitation clients. Soon it became clear that TWA applied to situations other than rehabilitation, and that the approach was a specific case of a more general method, which came to be known as Person-Environment Correspondence or PEC (Dawis, 2002). PEC bears modest similarity to the personenvironment approach advocated by Holland and colleagues. The central point of similarity is that, in determining suitable careers, both theories compare the attributes of individuals with the qualities needed in
occupations (Dawis, 1996; Dawis & Lofquist, 1991). One difference is that PEC places greater emphasis on individual abilities and their match to the ability patterns required by specific occupations. Ability is different from skill level, which can be acquired with preparation. Ability refers to aptitude, indicating the level of mastery an individual can achieve with suitable training and experience. Another difference is that PEC places greater weight on individual values and their correspondence to the value fulfillments provided by specific occupations (Dawis, 2002; Eggerth, 2008). PEC theory identifies six crucial values that need to be considered in assessment and counseling for career development. These values are as follows: 1. Achievement—the importance of using one’s
abilities and having a feeling of accomplishment
2. Altruism—the importance of harmony with, and being of service to, others
3. Autonomy—the importance of being independent and having a sense of control
4. Comfort—the importance of feeling comfortable and not being stressed
5. Safety—the importance of stability, order, and predictability
6. Status—the importance of recognition and being in a dominant position (Dawis, 2002, p. 446).
This list is not comprehensive and it is likely that additional values will emerge with further research. Of course, correspondence between personal values held by the client and the potential for their fulfillment in an occupation is central to work satisfaction and productivity. PEC theory is rich in complexity because it has evolved over more than five decades; we can only provide a few highlights. The central principle is that the more closely the rewards of the job or the organization correspond to the core values of the individual, the more likely it is that he or she will find satisfaction with the position. But PEC also invokes cognitive, personality, and environmental styles in its understanding of work adjustment. For example, environmental styles include celerity, pace,
rhythm, and endurance required to complete the job, which are each assessed on a continuum (Dawis, 1996): • Celerity refers to the quickness of
response that is needed in responding to job demands. For example, emergency room personnel often need to respond very quickly, whereas a diamond cutter would be foolhardy to do so.
• Pace refers to the level of effort needed in responding to the environment. A position such as office clerical worker might require modest effort in comparison to firefighter, where periods of intense effort will be encountered.
• Rhythm refers to whether the pattern of responding is steady, cyclical, or erratic. An example of a steady environment would be telephone operator, whereas a police officer might work in an erratic environment, facing hours of boredom interrupted with occasional bursts of fear.
• Endurance refers to whether the duration of responding to environmental demands is
brief or protracted. A position requiring less endurance might be financial advisor, whereas a computer software engineer employed under deadline would need to keep working, day and night, until the project is finished.
Andersen and Vandehey (2012) provide a useful illustration of how these environmental styles play out for specific occupations: Two examples demonstrating differing styles are
an emergency room and a gemsmith. An emergency room requires cyclical, intense work periods as well as down times. Medical personnel need high celerity (be fast) with a high level of effort (pace). Also, some surgeries could last up to 16 hours, requiring high endurance. By contrast, a gemsmith is ill advised to be fast when cutting gems, and the celerity requirements are low. In addition, several outstanding gems may be worth more money than many poorly cut stones (low pace). The work environment has a steady rhythm and probably requires varying
amounts of endurance, depending upon the stone size and complexity of the cuts (p. 47).
Of course, these four dimensions also manifest as measurable personality styles. In the world of career counseling, a mismatch between these two broad factors (environmental style required by a job, personality style preferred by the client) often is a precipitating referral issue. Dawis and colleagues offer 17 testable propositions derived from PEC and provide a wealth of supporting research (Dawis, 2002; Dawis & Lofquist, 1984). For example, one proposition is: Proposition III: P’s satisfaction is a function of
the correspondence of E’s reinforcers to P’s values, provided that P’s abilities correspond to E’s ability requirements.
Put simply, a person’s satisfaction with a job is a function of the match of the available environmental reinforcers with the values of the individual, provided that his or her abilities correspond to those required by the position. This is an empirically testable hypothesis that
has stood up well in research studies (Dawis, 2002). 11.15 STAGE THEORIES OF CAREER DEVELOPMENT Beginning in the 1950s, Donald Super and colleagues developed an influential stage theory in the field of career guidance and development (Super, 1953, 1994). His approach departs from the trait-factor method preferred by many in the field, and embraces a more flexible, holistic, life span perspective. The essentials of the theory were stated with elegant simplicity in his first and most widely cited article, “A theory of vocational development” (Super, 1953). Later papers provided additional details to the original framework (Super, Savickas, & Super, 1996). Super acknowledged the obvious fact that people differ in their abilities, interests, and personalities, but also believed that most people were qualified for several occupations, not just a few positions. Individuals and occupations were each flexible enough to “allow both some variety of occupations for each individual and
some variety of individuals in each occupation” (Super, 1953, p. 189). He argued that the individual self-concept evolves with time and experience, so that vocational choice and adjustment are continuous and lifelong processes. He envisioned five occupational life stages: growth, exploration, establishment, maintenance, and decline. These stages are sometimes known as a career ladder (Super et al., 1996). The growth stage extends into the teenage years and involves the observation of adult behavior and the exploration of fantasies and interests. The exploration stage was subdivided into fantasy, tentative, and realistic phases, as the young adult tries out one or more lines of training or education toward an eventual career. The establishment stage begins around age 25 or 30, and was subdivided into the trial and stabilization phases. Vocational development tasks encountered in this stage include the assimilation of organizational climate, the consolidation of positive relationships with coworkers, and the advancement of career
responsibilities through promotion (Super, 1990). In the maintenance stage of middle age, the individual may need to innovate, update skills, or face career stagnation. Additionally, some persons ask: “Should I remain in this career?” If the answer is “No” then the individual would reenter the exploration and establishment stages before attaining the maintenance stage. The last stage, decline, is hypothesized to occur in old age and may require possible specialization, disengagement, or retirement. The stage development theory proposed by Super provides a useful reminder that career development does not end in young adulthood but extends throughout the life span. However, the theory was based on career development as found in the dominant culture of his time which was mainly white and often middle class or higher. In a changing global economy, some of the developmental stages no longer seem as relevant. In particular, the maintenance phase is difficult for many to sustain because of the need for frequent career transitions (Friedman, 2009).
Super died in 1994. Toward the end of his career, he acknowledged new realities: Work and occupation provide a focus for
personality organization for most men and women, although for some individuals this focus is peripheral, incidental, or even nonexistent. Then other foci such as leisure activities and homemaking, may be central. Social traditions such as sex-role stereotyping and modeling, racial and ethnic biases, and the opportunity structure, as well as individual differences are important determinants of preferences for such roles as worker, student, leisurite, homemaker, and citizen (Super et al., 1996, p. 126).
The brief mention of “opportunity structure” is important to underscore, in light of the Great Recession experienced worldwide in the early part of the twenty-first century. 11.16 SOCIAL COGNITIVE APPROACHES
Social cognitive approaches to career development acknowledge that people learn and develop attitudes about work within a social context through observation and modeling of behavior. Prominent exemplars of this approach include Gottfredson (2005), Lent, Brown, and Hackett (2000), and Krumboltz (2009). In our coverage here, we summarize the recent views of John Krumboltz because of their direct relevance to matters of assessment. Krumboltz (2009) calls his approach the Happenstance Learning Theory (HLT). In brief: HLT posits that human behavior is the product
of countless numbers of learning experiences made available by both planned and unplanned situations in which individuals find themselves. The learning outcomes include skills, interests, knowledge, beliefs, preferences, sensitivities, emotions, and future actions (p. 135).
The theory is practical and compassionate in style, attempting to explain how and why each person follows a unique path, and describing
how counselors can facilitate development. In regard to the how and why of behavior, Krumboltz surveys genetic influences, learning experiences, environmental conditions, parents and caretaker influences, peer groups, and structured educational settings. He concludes by noting that “Social justice is not equally distributed among humans on our planet.” He argues powerfully that practitioners have a responsibility to help overcome social injustice. The proper uses of assessment might be a small part of the solution. HLT is based on four premises (Krumboltz, 2009): 1. The goal of career counseling is to help
clients learn to take actions to achieve more satisfying career and personal lives—not to make a single career decision. Krumboltz notes that the future is uncertain for everyone, especially in the world of work, where new careers emerge and old ones die out. In his view, making a single career decision is potentially foolhardy. A more
tentative, exploratory approach is to be preferred.
2. Assessments are used to stimulate learning —not to match personal characteristics with occupational characteristics. For example, in regard to interest assessment, Krumboltz contends that the goal is to help clients find attractive activities to explore now. In regard to happenstance, it is his experience that helping clients commit to new actions often will open up unexpected opportunities. A similar argument holds for personality assessment, which can be used to stimulate discussion about alternative settings for the client, and to identify areas of needed change (e.g., as-sertiveness training for an introverted client). It may also prove helpful to identify dysfunctional career beliefs by using the Career Beliefs inventory (Krumboltz & Vosvick, 1996), which is discussed later in this topic. Krumboltz (1993, 1996) has been critical of many interest inventories because most clients have little or no experience with the
topics being assessed. Instead of marking items as like, dislike, or indifferent, he playfully suggests that the response options should be “I don’t know yet,” “I haven’t tried that yet,” or “I’d like to learn more about that before I answer” (Krumboltz, 1996, p. 57). He also finds fault with these instruments because they focus excessively on cognitive matching of client to work environments, and overlook the emotional problems, including dysfunctional career beliefs, that hamper career development.
3. Clients learn to engage in exploratory actions as a way of generating beneficial unplanned events—not to plan all their actions in advance. The statement that “chance favors only the prepared mind” is attributed to Louis Pasteur (1822-1895), the French biologist and chemist. But the statement can be applied to career development as well. Krumboltz asserts that the goal of the counselor is to help clients engage in activities that are likely to generate unplanned events, and to prepare
clients to benefit from these happenstance occurrences. An example might be encouraging an unemployed client to join a health club as a means of exploring her interests in yoga. At the club she befriends a bank manager who is impressed by her winsome personality, which leads to a job interview and a new career endeavor.
4. The success of counseling is assessed by what the client accomplishes in the real world outside the counseling session—not by what takes place during counseling. HLT is an action-based theory. The task of the counselor is to collaboratively identify things that the client can do outside of the consultation that will promote new learning and new opportunities. A simple example is asking the client to commit to one action step between appointments (e.g., ask three people how they came to be working in their current job) and to report back by email how things went.
11.17 O*NET IN CAREER DEVELOPMENT The Occupational Information Network or O*NET is the primary source of occupational information in the United States. O*NET is sponsored by the U.S. Department of Labor and is free and open to anyone in the world who has an Internet connection. This is a rich and sophisticated database that includes detailed information on nearly 1,000 specific occupations. For each occupation, the website lists the knowledge, skills, and abilities needed. Personality qualities needed, education required, technology needs, and typical salary also are given. The website provides several assessment tools for career exploration, including a number of instruments that can be self-administered. For example, the O*NET Interest Profiler is an online test consisting of 60 occupational activities that are rated on a five-point scale from strongly dislike to strongly like. The test not only yields a score for each of the six
RIASEC dimensions, but also links to a user- friendly list of specific occupations suited to the preparation level selected by the examinee. Further, these occupations are individually rated for employment outlook, environmental or “green” appeal, and apprenticeship needed.
11.18 INVENTORIES FOR CAREER ASSESSMENT
One guiding motif in this topic is that successful assessment for career guidance requires ongoing interaction with clients. Career counseling extends well beyond mere testing. Avoiding the “test and tell” trap is vital. Even so, the use of appropriate assessment tools can be helpful, sometimes even essential. The number of instruments available for career assessment is huge, and new tools emerge every year. We survey a number of widely used tests here, to provide a sense of the diversity available. We begin with a specialized tool designed to challenge mal-adaptive career beliefs. Career Beliefs Inventory
Krumboltz (1991) created the Career Beliefs Inventory to identify and measure attitudes and beliefs that might block career development. In his work with clients, he often noted that people firmly hold to self-limiting beliefs that prevent them from finding a satisfying job or career. Examples of such beliefs include: • I don’t have enough confidence to try that • I don’t have the skills needed for that
position • I can’t do that because I don’t have any
experience • I’m really dumb when it comes to that kind
of activity • It would involve too much risk to go in that
direction • That kind of work wouldn’t give me any
satisfaction The Career Beliefs Inventory (CBI) was designed to increase the awareness of clients to underlying career beliefs and to gauge the potential influence of these beliefs on occupational choice and life satisfaction.
The CBI can be taken individually or administered in a group setting to persons in grade 8 or higher. The paper-and-pencil test can be hand-scored, but computer-scoring is preferable because it yields an elegant 12-page report. Hand scoring is also confusing and likely to introduce errors. The 96 test items, all in Likert format, are grouped into 25 scales organized under the following five headings: 1. Your Current Career Situation. Four Scales:
Employment Status, Career Plans, Acceptance of Uncertainty, and Openness.
2. What Seems Necessary for Your Happiness. Five scales: Achievement, College Education, Intrinsic Satisfaction, Peer Equality, and Structured Work Environment.
3. Factors that Influence Your Decisions. Six scales: Control, Responsibility, Approval of Others, Self-other Comparisons, Occupation/College Variation, and Career Path Flexibility.
4. Changes You Are Willing to Make. Three scales: Post-training Transition, Job Experimentation, and Relocation.
5. Effort You Are Willing to Initiate. Seven scales: Improving Self, Persisting While Uncertain, Taking Risks, Learning Job Skills, Negotiating/Searching, Overcoming Obstacles, and Working Hard.
Standardization of the CBI is based on more than 7,500 individuals in the United States and Australia. The sample was reasonably diverse, with age range of 12 to 75, including junior high, high school, and college students, as well as adults, both employed and unemployed. Initial test-retest reliability data for the CBI are mixed, with one month reliabilities ranging from .30s to the .70s for the high school sample. Internal consistencies were likewise modest, with coefficients mainly in the range of .40 to .50. This might be due to the small number of items for some scales, as few as two items for several scales. Fuqua and Newman (1994) recommend that the CBI could be improved if
additional items were added to some of the scales. Walsh (1996) supplemented the original standardization sample for the CBI with nearly 600 additional participants. She reported more promising results, with internal consistencies ranging from the low .30s to the high .80s, with a mean coefficient alpha of .57 for the CBI scale scores. Regarding validity, results of factor analyses did find reproducible clusters of beliefs, but these did not correspond to the scale clusters provided in the CBI reports. She suggests that the practical application of the CBI might rest with exploring client beliefs at the level of the individual items (Walsh, Thompson, & Kapes, 1996). In a study of convergent validity correlating CBI results with data from four other personality and vocational inventories, Holland, Johnston, Asama, and Polys (1993) reported at least moderate construct validity for most of the CBI scales. They concluded that the test seems to be measuring variance in career variables not assessed by other instruments. In addition,
significant correlation of some CBI scales with the State-Trait Anxiety Inventory indicated that certain self-limiting and irrational beliefs caused emotional discomfort. 11.19 INVENTORIES FOR INTEREST ASSESSMENT In most applications of psychological testing, the goals of assessment are reasonably clear. For example, intelligence testing helps predict school performance; aptitude testing foretells potential for accomplishment; and personality testing provides information about social and emotional functioning. But what is the purpose of interest assessment? Why would a psychologist recommend it? What can a client expect to gain from a survey of his or her interests? Interest assessment promotes two compatible goals: life satisfaction and vocational productivity. It is nearly self-evident that a good fit between individual interests and chosen vocation will help foster personal life satisfaction. After all, when work is interesting
we are more likely to experience personal fulfillment as well. In addition, persons who are satisfied with their work are more likely to be productive. Thus, employees and employers both stand to gain from the artful application of interest assessment. Several useful instruments exist for this purpose, and we will review the most widely used interest inventories later. In the selection of employees, the consideration of personal interests may be of great practical significance to employers and, therefore, circumstantially relevant to the job candidates as well. We may sketch out a rough equation as follows: productivity = ability X interest. In other words, high ability in a specific field does not guarantee success; neither does high interest level. The best predictions are possible when both variables are considered together. Thus, employers have good reason to determine whether a potential employee is well matched to the position; the employee should like to know as well. Working from the Holland RIASEC model described earlier, Ny, Su, Rounds, and Drasgow
(2012) recently completed an intriguing quantitative summary of 60 years of research on the relationship between vocational interests, person-environment fit, and job performance. Their review was based on 568 correlations from published empirical studies. The basic premise of their survey was that: Holland’s theory suggests that the similarities
between an individual’s interest profile and the profile of his or her occupation should predict tenure and performance in academic and work domains (p. 387).
This is exactly what their analyses revealed. For the employment studies reviewed, the correlations between “fit” (congruence between an individual’s Holland code and the code of his/her chosen occupation) and job performance ranged from .21 to .30, depending on the inventory used and the characteristics of the study. The same pattern emerged in the academic samples. The correlations between “fit” (congruence between a student’s Holland code and the code of his/her chosen major) and grades were mainly in the range of .27 to .31. In
other words, when employees or students possess interest patterns that match the expectations of their job or major, they are more likely to be productive in their work or studies. We turn now to a critical examination of major interest tests. The four instruments chosen for review include: • The Strong Interest Inventory-Revised (SII-
R), the latest revision of the well-known Strong Vocational Interest Blank (SVIB)
• The Vocational Preference Inventory (VPI), a useful inventory that embodies the RIASEC model of John Holland
• The Self-Directed Search (SDS), a self- administered and self-scored guide to exploring career options
• The Campbell Interest and Skill Survey (CISS), an appealing test that is simple in format but sophisticated in execution
Strong Interest Inventory-Revised (SII- R) The Strong Interest Inventory-Revised (SII-R) is the latest revision of the Strong Vocational
Interest Blank (SVIB), one of the oldest and most prominent instruments in psychological testing (Donnay, Thompson, Morris, & Schaubhut, 2004). We can best understand the SII-R by studying the history of its esteemed predecessor, the SVIB. In particular, we need to review the guiding assumptions used in the construction of the SVIB that have been carried over into the SII-R. The first edition of the SVIB appeared in 1927, eight years after E. K. Strong formulated the essential procedures for measuring occupational interests while attending a seminar at the Carnegie Institute of Technology (Campbell, 1971; Strong, 1927). In constructing the SVIB, Strong employed two little-used techniques in measurement. First, the examinee was asked to express liking or disliking for a large and varied sample of occupations, educational disciplines, personality types, and recreational activities. Second, the responses were empirically keyed for specific occupations. In an empirical key, a specific response (e.g., liking to roller skate) is assigned to the scale for a particular occupation
only if successful persons in that occupation tend to answer in that manner more often than comparison subjects. Although Strong did not express his underlying assumptions in a simple and straightforward manner, it is clear that the theoretical foundation for the SVIB derives from a typological, trait- oriented conception of personality. Tzeng (1987) has identified the following basic assumptions in the development and application of the SVIB: 1. Each occupation has a desirable pattern of
interests and personality characteristics among its workers. The ideal pattern is represented by successful people in that occupation.
2. Each individual has relatively stable interests and personality traits. When such interests and traits match the desirable interest patterns of the occupation the individual has a high probability to enter that occupation and be more likely to succeed in it.
3. It is highly possible to differentiate individuals in a given occupation from others-in-general in terms of the desirable patterns of interests and traits for that occupation.
Strong constructed the scales of his inventory by contrasting the responses of several specific occupational criterion groups with those of a people-in-general group. The subjects for each criterion group were workers in that occupation who were satisfied with their jobs and who had been so employed for at least three years. The items that differentiated the two groups, keyed in the appropriate direction, were selected for each occupational scale. For example, if members of a specific occupational group disliked “buying merchandise for a store” more often than people-in-general, then that item (keyed in the dislike direction) was added to the scale for that occupation. The first SVIB consisted of 420 items and a mere handful of occupational scales (Strong, 1927). Separate editions for men and women followed shortly. The inventory has undergone
numerous revisions over the years (Tzeng, 1987), culminating in the modern instrument known as the Strong Interest Inventory-Revised (Campbell, 1974; Hansen, 1992; Hansen & Campbell, 1985; Donnay, et al., 2004). Although the Strong Interest Inventory (SII-R) was fashioned according to the same philosophy as the SVIB, the latest revision departs from its predecessors in a number of ways. The SII-R was developed with the following goals in mind: • Shorten the instrument • Add current occupations • Increase the level of business, technology,
and teamwork measures • Broaden the assessment of work and leisure
activities • Reflect the diversity of the U.S. workforce
in the samples obtained The SII-R consists of 291 items answered in a 5-point Likert format, with options of Strongly Like, Like, Indifferent, Dislike, Strongly Dislike. The standardization sample (N = 2,250) consists of an equal number of employed men and
women from the U.S. workforce. The sample is restricted to employed persons because the main purpose of the test is to determine interest patterns within occupational groups. Racial and ethnic groups accurately represent the U.S. population and constitute 30 percent of the sample. Test results are organized in six sections. At the most global level are the six General Occupational Theme scores, namely, Realistic, Investigative, Artistic, Social, Enterprising, and Conventional. These scores are based on the theoretical analysis of Holland (1966, 1985), whose work was discussed earlier. Each theme score pertains to a major interest area that describes both a work environment and a type of person. For example, persons scoring high on the Realistic theme are generally quite robust, have difficulty expressing their feelings, and prefer to work outdoors with heavy machinery. The 30 Basic Interest Scales are found within the general theme scores. These identify specific interest domains, indicating areas likely to be stimulating and rewarding to the client.
Examples of these scales include Counseling and Helping, Visual Arts and Design, Marketing and Advertising, Finance and Investing, Medical Science, and Mechanics and Construction. The interest scales are empirically derived and consist of substantially intercorrelated items. The most detailed results consist of 130 Occupational Scales, with separate normative data for each gender. Scores on these scales indicate the similarity of people of the client’s gender who have been working in, and are satisfied with, the listed occupation. Each scale produced at least a one standard deviation separation between the occupational sample and the reference sample, supporting the distinctiveness of specific career paths (Donnay et al., 2004). The SII-R also yields five Personal Style Scales. These are designed to measure preferences for broad styles of living and working. These scales assist in vocational guidance by showing the level of comfort with distinctive styles. The five style scales are as follows:
1. Work Style, on which a high score indicates a preference to work with people and a low score signifies an interest in ideas, data, and things;
2. Learning Environment, on which a high score indicates a preference for academic learning environments and a low score indicates a preference for more applied learning activities;
3. Leadership Style, on which a high score indicates comfort in taking charge of others and a low score indicates uneasiness; and
4. Risk Taking/Adventure, on which a high score indicates a preference for risky and adventurous activities as opposed to safe and predictable activities; and
5. Team Orientation, on which a high score indicates a preference for collaboration and working on teams as opposed to working independently.
The personal style scales each have a mean of 50 and a standard deviation of 10. Note that these are truly bipolar scales for which each pole is distinct and meaningful.
The SII-R can only be scored by prepaid answer sheets or booklets that are mailed or faxed to the publisher, or through purchase of a software system that provides on-site scoring for immediate results. The results consist of a lengthy printout that is organized according to several themes. All scores are expressed as standard scores with a mean of 50 and an SD of 10. Evaluation of the SII-R The SII-R represents the culmination of over 70 years of study, involving literally thousands of research reports and hundreds of thousands of respondents. In evaluating this instrument, we can only outline basic trends in the research, referring the reader to other sources for details (Bailey, Larson, Borgen, & Gasser, 2008; Savickas, Taber, & Spokane, 2002; Hansen, 1992; Hansen & Campbell, 1985). We should also point out that evaluations of the reliability and validity of the SII-R are based in part upon its similarity to the SII and SVIB, for which a huge amount of technical data exists.
Based upon test-retest studies, the reliability of the Strong has proved to be exceptionally good in the short run, with one- and two-week stability coefficients for the occupational scales generally in the .90s. When the test-retest interval is years or decades, the correlations drop to the .60s and .70s for the occupational scales, except for respondents who were older (over age 25) upon first testing. For younger respondents first tested as adolescents, the median test-retest correlation after 15 years is around .50 (Lubinski, Benbow, & Ryan, 1995). But for older respondents, first tested after the age of 25, the median test-retest correlation 10 to 20 years later is a phenomenal .80 (Campbell, 1971). Apparently, by the time we pass through young adulthood, personal interests become extremely stable. The questions on the SII-SVIB capture that stability in the occupational scores, providing support for the trait conception of personality upon which these instruments were based. The validity of the Strong is premised largely on the ability of the initial occupational profile to
predict the occupation eventually pursued. Strong (1955) reported that the chances were about two in three that people would be in occupations predicted by high occupational scale scores, and about one in five that respondents would be in occupations for which they had shown little interest when tested. Although other researchers have quibbled with the exact proportions (Dolliver, Irvin, & Bigley, 1972), it is clear that the SII-SVIB has impressive hit rates in predicting occupational entry. The instrument functions even better in predicting the occupations that an examinee will not enter. In a recent study, Donnay and Borgen (1996) provide evidence for construct validity by demonstrating strong overall differentiation between 50 occupational groups on the SII: The big picture is that people in diverse
occupations show large and predictable differences in likes and dislikes, whether in terms of vocational interests or in terms of personal styles. And the Strong provides valid, structural, and comprehensive measures of these differences. (p. 290)
The SII-R is used mainly with high school and college students and adults seeking vocational guidance or advice on continued education. Because most students’ interests are undeveloped and unstabilized prior to age 13 or 14, the test is not recommended for use below high-school level. As evident in the reliability data reported, the SII-R becomes increasingly valuable with older subjects, and it is not unusual to see middle-aged persons use the results of this instrument for guidance in career change. Vocational Preference Inventory The Vocational Preference Inventory is an objective, paper-and-pencil personality interest inventory used in vocational and career assessment (Holland, 1985c). The VPI measures 11 dimensions, including the six personality- environment themes of Realistic, Investigative, Artistic, Social, Enterprising, and Conventional, and five additional dimensions of Self-Control, Masculinity/Femininity, Status, In-frequency, and Acquiescence. The test items consist of 160
occupational titles toward which the examinee expresses a feeling by marking y (yes) or n (no). The VPI is a brief test (15 to 30 minutes) and is intended for persons 14 years and older with normal intelligence. As noted previously, Holland proposes that personality traits tend to cluster into a small number of vocationally relevant patterns, called types. For each personality type there is also a corresponding work environment best suited to that type. According to Holland, there are six types: Realistic, Investigative, Artistic, Social, Enterprising, and Conventional. This is sometimes known as the RIASEC model, in reference to the first letters of the six types. Test-retest reliability coefficients for the six major scales range from .89 to .97. VPI norms are based upon large convenience samples of college students and employed adults from earlier VPI editions. The characteristics of the standardization sample are not well defined, which makes the norms somewhat difficult to interpret (Rounds, 1985).
The validity of the VPI is essentially tied to the validity of Holland’s (1985a) hexagonal model of vocational interests. Literally hundreds of studies have examined this model from different perspectives. We will cite trends and representative studies. The reader is referred to Holland (1985c) and Walsh and Holland (1992) for more details. Several VPI studies have investigated a key assumption of Holland’s theory—that individuals tend to move toward environments that are congruent with their personality types. If this assumption is correct, then the real-world match between work environments and personality types of employees should be substantial. We should expect to find that Realistic environments have mainly Realistic employees, Social environments have mainly Social employees, and so on. Research on this topic has followed a straightforward methodology: Subjects are tested with the VPI and classified by their Holland types (using up to six letters); the work environments of the subjects are then independently classified by an
appropriate environmental measure; finally, the degree of congruence between persons and environments is computed. In better studies, a correction for chance agreement is also applied. Using his hexagonal model, Holland has developed occupational codes as a basis for classifying work environments (Gottfredson & Holland, 1989; Holland, 1966, 1978, 1985c). For example, landscape architect is coded as RIA (Realistic, Investigative, Artistic) because this occupation is known to be a technical, skilled trade (Realistic component) that requires scientific skills (Investigative component) and also demands artistic aptitude (Artistic component). The Realistic component is listed first because it is the most important for landscape architect, whereas the Investigative and Artistic components are of secondary and tertiary importance, respectively. Some other occupations and their codes are taxi driver (RSE), mathematics teacher (ISC), reporter (ASE), police officer (SRE), real estate appraiser (ECS), and secretary (CSA). In a
similar manner, Holland has also worked out codes for different college majors. One approach to congruence studies is to compare VPI results of students or workers with the Holland codes that correspond to their college majors or occupations. For example, VPI Holland codes for a sample of police officers should consist mainly of profiles that begin with S and should contain a larger-than- chance proportion of specifically SRE profiles. Furthermore, the degree of congruence should be related to the degree of expressed satisfaction with that line of work or study. Research with college students provides strong support for the congruence prediction: Students tend to select and enter college majors that are congruent with their primary personality types (Holland, 1985a; Walsh & Holland, 1992). Thus, Artistic types tend to major in art, Investigative types tend to major in biology, and Enterprising types tend to major in business, to cite just a few examples. These results provide strong support for the VPI and the theory upon which it is based.
This short review has barely touched the surface of supportive validity studies with the VPI. Walsh and Holland (1992) cite several additional lines of research that buttress the validity of this test. But not all studies of the VPI affirm its validity. Furnham, Toop, Lewis, and Fisher (1995) failed to find a relationship between personality–environment (P-E) “fit” and job satisfaction, a key theoretical underpinning of the test. According to Holland’s theory, the better the P-E fit, the greater should be job satisfaction. In three British samples, the relationships were weak or nonexistent, suggesting that the VPI does not “travel well” in cultures outside of the United States. Self-Directed Search Holland has always shown a keen interest in the practical applications of his research on vocational development. Consistent with this interest, he developed the Self-Directed Search, a highly practical, brief test that is appealing in its simplicity (Holland, 1985a, b). As the name suggests, the Self-Directed Search is designed to
be a self-administered, self-scored, and self- interpreted test of vocational interest. The SDS measures the six RIASEC vocational themes described previously. The SDS consists of dichotomous items that the examinee marks “like” or “dislike” (or “yes” or “no”) in four sections: (1) Activities (six scales of 11 items each); (2) Competencies (six scales of 11 items each); (3) Occupations (six scales of 14 items each); and (4) Self-Estimates (two sets of six ratings). For each section, the face-valid items are grouped by RIASEC themes. For each theme, the total number of “like” and “yes” answers is combined with the self-estimates of ability to come up with a total theme score. The SDS takes 30 to 50 minutes for completion and is intended for persons 15 years and older. The RIASEC themes on the SDS showed test- retest reliabilities that range from .56 to .95 and internal consistencies that range from .70 to .93. Norms for SDS scales and codes are reported for pooled convenience samples of 4,675 high school students, 3,355 college students, and 4,250 employed adults ages 16 through 24
(Holland, 1985a, b). However, SDS results are typically interpreted in an individualized, ipsative manner (“Is this occupation a good fit for this client?”), so normative data are of limited relevance. The SDS is available in a hand-scored paper- and-pencil version and a computerized version as well. Unfortunately, the paper-and-pencil version is prone to a 16 percent clerical error rate when used by high school students (Holland, 1985a, b). The user-friendly microcomputer test is probably the preferred version because of the ease of administration and the error-free scoring and interpretation. When a subject takes the SDS, the three highest theme scores are used to denote a summary code. For example, a person whose three highest scores were on Investigative, Artistic, and Realistic would have a summary code of IAR. In a separate booklet distributed with the test— the Occupations Finder— the examinee can look up his or her summary code and find a list of occupations that provide the best “fit.” For example, an examinee with an IAR summary
code would learn that he or she most closely resembles persons in the following occupations: anthropologist, astronomer, chemist, pathologist, and physicist. The test booklet contains additional information, which helps the examinee explore relevant career options. The SDS serves a very useful purpose in providing a quick and simple format for prompting young persons to examine career alternatives. By eliminating the time-consuming process of administration, scoring, interpretation, and counselor feedback, the test makes it possible for a wide audience to receive an introductory level of career counseling. Holland (1985a, b) proposes that the SDS is appropriate for up to 50 percent of students and adults who might desire career guidance. Presumably, the other 50 percent would find the SDS an insufficient basis for career exploration. Holland (1985a, b) rightfully warns users to consider many sources of information in career choice and not to rely too heavily on test scores per se. Levinson (1990) discusses the integration of SDS data with other
psychoeducational data to make specific vocational recommendations for high school students. LaBarbera (2005) illustrates the potential application of the SDS in a study of 463 physician assistants (PAs) known to be well satisfied with their work. The PAs are medical professionals who provide care under the supervision of a licensed physician. This is a demanding profession with well defined duties that include many of the same functions provided by a general practitioner. Who is a good candidate for this up-and-coming profession in high demand? LaBarbera (2005) determined that the Holland profile was a distinctive SIR for men, especially those with interests in surgery, whereas the profile for women maintained the first two letters (SI) but yielded a muddle for the third theme. This is valuable information for prospective students and career counselors. The validity of the SDS is linked to the validity of the hexagonal model of personality and environments upon which the test is based. One
aspect of validity, then, is whether the model makes predictions that are confirmed by SDS results in the real world. In general, the results from over 400 studies support the construct validity of the SDS (Dumenci, 1995; Holland, 1985a, b, 1987). One approach to construct validity is to determine whether the relationships among SDS scales make theoretical sense. One tenet of construct validity is that similar scales should reveal stronger relationships, dissimilar scales weaker relationships. For example, it is not hard to imagine one person combining Artistic and Investigative themes in personality and work environment. After all, these themes are mildly similar, so we would predict a moderately positive correlation between them. This is exactly what Holland (1985a, b) found. In a general reference sample of 175 women aged 26 to 65 years, scores on these two themes correlated modestly, r = .26, as would be predicted. Further, unrelated themes like Investigative and Enterprising (which bear little in common) should reveal a weak correlation.
The value turned out to be a negligible r = −.02. Overall, the various correlations among the six themes of the SDS make theoretical sense, which supports the construct validity of the test. The predictive validity of the SDS has been investigated in several dozen studies, which are summarized by Holland (1985a, b, 1987). The typical methodology for these studies is that SDS high-point codes for large samples of students are compared with the first letter of their occupational choices (or aspirations) one to three years later. Overall, the findings indicate that the SDS has moderate to high predictive efficiency, depending upon the age of the sample (hit rates go up with age), the length of the time interval (hit rates go down with time), and the specific category predicted (hit rates are better for Investigative and Social predictions) (Gottfredson & Holland, 1975). Campbell Interest and Skill Survey The Campbell Interest and Skill Survey (CISS; Campbell, Hyne, & Nilsen, 1992) is a newer measure of self-reported interests and skills. The
test is designed to help individuals make better career choices by describing how their interests and skills match the occupational world. The primary target population for the CISS is students and young adults who have not entered the job market, but the test is also suitable for older workers who are considering a change in careers. The test is appropriate for persons 15 years of age and older with a sixth-grade reading level, although younger children can be tested in exceptional circumstances. The CISS consists of 200 interest items and 120 skill items. The interest items include occupations, school subjects, and varied working activities that the examinee rates on a six-point scale from strongly like to strongly dislike. The interest items resemble the following: • A pilot, flying commercial aircraft • A biologist, working in a research lab • A police detective, solving crimes
The skill items include a list of activities that the examinee rates on a six-point scale from expert (widely recognized as excellent in this area) to
none (have no skills in this area). The skill items resemble the following: • Helping a family resolve its conflicts • Making furniture, using woodworking and
power tools • Writing a magazine story
CISS results are scored on several different kinds of scales: Orientation Scales, Basic Interest and Skill Scales, Occupational Scales, Special Scales, and Procedural Checks. All scale scores are reported as T scores, normed to a population average of 50, with a standard deviation of 10. The Orientation Scales serve to organize the CISS profile—the interest, skill, and occupational scales are reported under the appropriate Orientations. The seven Orientations are as follows (Campbell et al., 1992, pp. 2-3): • Influencing—influencing others through
leadership, politics, public speaking, and marketing
• Organizing—organizing the work of others, managing, and monitoring financial performance
• Helping—helping others through teaching, healing, and counseling
• Creating—creating artistic, literary, or musical productions, and designing products or environments
• Analyzing—analyzing data, using mathematics, and carrying out scientific experiments
• Producing—producing products, using “hands-on” skills in farming, construction, and mechanical crafts
• Adventuring—adventuring, competing, and risk taking through athletic, police, and military activities
There are 29 pairs of Basic Scales, each pair consisting of parallel interest and skill scales. The Basic Scales are clustered within the seven Orientations, based upon their intercorrelations. For example, the Helping Orientation contains the following Basic Scales, each with separate interest and skill components: Adult
Development, Counseling, Child Development, Religious Activities, and Medical Practice. The 58 pairs of Occupational Scales, each with separate interest and skill components, provide feedback on the degree of similarity between the examinee and satisfied workers in that occupation. These scales were constructed empirically by contrasting the responses of happily employed persons in specific occupations with responses of a general reference sample drawn from the working population at large. In addition to Basic and Occupational Scales, the CISS incorporates three special scales: Academic Focus, a measure of interest and confidence in intellectual, scientific, and literary activities; Extraversion, a measure of social extraversion; and Variety, a measure of the examinee’s breadth of interests and skills. Finally, the CISS reports a variety of Procedural Checks to detect possible problems in test taking such as random responding or excessive omissions.
Overall, the reliability of CISS scales is exceptionally strong. For example, coefficient alpha for the Orientation Scales is typically in the high .80s, and three-month test-retest reliabilities for 324 respondents are in the mid- to high .80s. Similar findings for reliability are reported for the Basic and Occupational Scales. Norms for the CISS are based upon 5,000 subjects spread over the 58 occupations. The authors report extensive validity data for the Occupational Scales, including sample means for each occupational sample as well as lists of the three highest- and lowest-scoring occupations for each scale (Campbell et al., 1992). These data document that the scales do discriminate between occupations in an effective and meaningful way. For example, the average T score on accountant by accountants is 75.8. Statisticians, bookkeepers, and financial planners achieve the next three highest scores for this scale, with average T scores in the low 60s. Commercial artists, professors, and social workers obtain the three lowest scores, with average T scores around 40. Because these
results fit well with our expectations about occupational interest and skill patterns, they provide support for the validity of the CISS. Independent correlational studies also support the validity of the CISS. For example, in a sample of 221 college students, Hansen (2007) correlated CISS Skill Scale scores with SII scores and found strong evidence for convergent and discriminant validity (i.e., strong correlations with similar scales, negligible correlations with dissimilar scales). In a sample of 118 adults, Savickas et al. (2002) correlated scores from individual occupational scales of the CISS with scores from the scales of other mainstream instruments such as the Strong Interest Inventory. They also found strong support for both convergent validity (i.e., modest correlations for same-named pairs of scales) and discriminant validity (i.e., negligible correlations for unlike pairs of scales). In a sample of 128 college students, Hansen and Neuman (1999) confirmed the concurrent validity of the CISS by finding a good fit between occupational scale scores and students’
chosen majors. The fit was considered “excellent” or “moderately good” for more than 70 percent of the students. Boggs (1999) provides a review and critique of the CISS. Campbell (2002) presents the history and development of the instrument. This instrument will almost certainly receive increased attention in the years ahead. One noteworthy feature of the CISS is the comprehensiveness and clarity of the profile report form. The report consists of 11 user- friendly pages. We have reprinted two pages in Figure 11.4 for illustrative purposes. This format is preferable to the detail-rich but eye-straining graphs encountered with many instruments. The CISS promises to rival the Strong Interest Inventory for vocational guidance of young adults.
FIGURE 11.4 Representative Sections from the Campbell Interest and Skill Survey
Note: The full profile consists of an 11-page printout. Source: From Campbell Interest and Skill Survey (CISS). Copyright © 1997 David Campbell, Ph.D. Reproduced with permission of the publisher NCS Pearson, Inc. All rights reserved. “Campbell” and “CISS” are trademarks, in the US and/or other countries, of Pearson Education, INC. or its affiliates.
Cornell University ILR School DigitalCommons@ILR
Cornell HR Review
1-26-2013
Personality Tests in Employment Selection: Use With Caution H. Beau Baez Charlotte School of Law
Follow this and additional works at: http://digitalcommons.ilr.cornell.edu/chrr
Part of the Human Resources Management Commons, and the Labor Relations Commons Thank you for downloading an article from DigitalCommons@ILR . Support this valuable resource today!
This Article is brought to you for free and open access by DigitalCommons@ILR . It has been accepted for inclusion in Cornell HR Review by an authorized administrator of DigitalCommons@ILR . For more information, please contact [email protected].
Personality Tests in Employment Selection: Use With Caution
Abstract [Excerpt] Many employers utilize personality tests in the employment selection process to identify people who have more than just the knowledge and skills necessary to be successful in their jobs.[1] If anecdotes are to be believed—Dilbert must be getting at something or the cartoon strip would not be so popular—the work place is full of people whose personalities are a mismatch for the positions they hold. Psychology has the ability to measure personality and emotional intelligence (“EQ”), which can provide employers with data to use in the selection process. “Personality refers to an individual’s unique constellation of consistent behavioral traits”[2] and “emotional intelligence consists of the ability to perceive and express emotion, assimilate emotion in thought, understand and reason with emotion, and regulate emotion.”[3] By using a scientific approach in hiring, employers can increase their number of successful employees.
Keywords HR Review, Human Resources, employment selection, personality tests
Disciplines Human Resources Management | Labor Relations
Comments Suggested Citation: Baez H. (2013, January 26). Personality tests in employment selection: Use with caution. Cornell HR Review. Retrieved [insert date] from Cornell University, ILR School site: http://digitalcommons.ilr.cornell.edu/chrr/ 59
This article is available at DigitalCommons@ILR: http://digitalcommons.ilr.cornell.edu/chrr/59
- Cornell University ILR School
- DigitalCommons@ILR
- 1-26-2013
- Personality Tests in Employment Selection: Use With Caution
- H. Beau Baez
- Personality Tests in Employment Selection: Use With Caution
- Abstract
- Keywords
- Disciplines
- Comments
- www.cornellhrreview.org/...Personality-Testing.pdf

Get help from top-rated tutors in any subject.
Efficiently complete your homework and academic assignments by getting help from the experts at homeworkarchive.com