Company personality tests
A decade ago, researchers discovered something that should have opened eyes and raised red flags in the business world.
Sara Rynes, Amy Colbert, and Kenneth Brown conducted a study in 2002 to determine whether the beliefs of HR professionals were consistent with established research findings on the effectiveness of various HR practices. They surveyed 1, 000 Society for Human Resource Management (SHRM) members — HR Managers, Directors, and VPs — with an average of 14 years’ experience.
The results? The area of greatest disconnect was in staffing— one of the lynchpins of HR. This was particularly prevalent in the area of hiring assessments, where more than 50% of respondents were unfamiliar with prevailing research findings.
Several studies since have explored why these research findings have seemingly failed to transfer to HR practitioners. Among the causes are the fact that HR professionals often don’t have time to read the latest research; the research itself is often present with technically complex language and data; and that the prospect of introducing an entirely new screening measure is daunting from multiple angles.
At the same time, anyone who has ever been responsible for hiring, much less managing, employees knows that there is a wide variation in worker performance levels across jobs. Therefore, it is critical for organizations to understand what differences among individuals systematically affect job performance so that the candidates with the greatest probability of success can be hired.
So what are the most effective screening measures?
Extensive research has been done on the ability of various hiring methods and measures to actually predict job performance. A seminal work in this area is Frank Schmidt’s meta-analysis of a century’s worth of workplace productivity data, first published in 1998 and recently updated. The table below shows the predictive validity of some commonly used selection practices, sorted from most effective to least effective, according to his latest analysis that was shared at the Personnel Testing Counsel Metropolitan Washington chapter meeting this past November:
So if your hiring process relies primarily on interviews, reference checks, and personality tests, you are choosing to use a process that is significantly less effective than it could be if more effective measures were incorporated.
And yet that’s how many companies operate. According to a 2011 NBC News article, the use of personality assessments are on the rise, growing as much as 20% annually. Especially problematic is the widespread use of Four Quadrant (4-Q) personality tests for hiring, something I see regularly in my consulting work.
A 4-Q assessment is one where the results classify you as some combination of four different options labeled as letters, numbers, colors, animals, etc. They originated around 450 BC when Empedocles noticed that he could group people’s behavior into four categories which he labeled earth, water, fire, and air. Hippocrates made the same observation, but (coming from a medical background) labeled the categories blood, phlegm, black bile, and yellow bile. Since then, hundreds of iterations of these tools have been developed, all essentially based on the same premise and theory.
Generally speaking, 4-Q tools consist of a list of adjectives from which respondents select words that are most/least like them, and are designed to measure “style, ” or tendencies and preferences. While they can seem highly insightful — not to mention being widely available and inexpensive — they have some severe shortcomings when used in high stakes applications such as hiring.
For one, they tend to be highly transparent, enabling a test taker to manipulate the results in a way that they feel will be viewed favorably by the administrator. Also, since they are designed to measure “states” (as opposed to more stable “traits”), there is a significant chance that the results will change over time as the individual’s context changes (most publishers of 4-Q tests recommend that individuals re-take them at fairly frequent intervals for this reason).