The Lexical Hypothesis and Cognitive Psychology: Strange Bedfellows?

Introduction

Both 'Big Five' personality tests (Neo-PI and HEXACO) for personality assessment, and diagnostic schedules used in the diagnosis of psychopathologies in The Diagnostic and Statistical Manual for diagnosing mental disorders (DSM) (and in the ICD11) deploy psychometric instruments which are grounded in factor analyses of natural languages and concepts linked to terms in natural languages. In most cases they're literally driven by linguistic-conceptual premises and linguistic descriptive analyses. In other words: they're based upon our use of everyday language and the concepts we use when speaking.

This is interesting and surprising for a number of reasons. One is that the current prevailing models for psychological science are the cognitive and biopsychosocial models, and the previously influential model was behaviourist. This is interesting because behaviourism largely eschews any linguistic basis for psychological assessment (although verbal cues and reports can certainly be used in experiments) and cognitive psychology does not emphasise language as central to psychological science (hence the debates about mentalese).

In this brief article I introduce and discuss EFA and CFA, and then discuss how they relate to the linguistic premises associated with the lexical hypothesis which grounds the development of most psychometric instruments.

Factor Analysis: From Dictionary Words to Psychological Constructs

Factor analysis, a cornerstone of psychometric instrument development, is usually implemented in one of two primary forms: Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA). These statistical techniques are pivotal in understanding the underlying structure of psychological constructs. Surprisingly, many instruments developed using these methods have their conceptual roots deeply embedded in natural language, specifically in the words found within dictionaries that describe human behavior and personality. This connection, however, exists within a broader psychological landscape where paradigms like cognitive psychology and the behaviorism it largely displaced have had different primary foci than everyday language use.

Exploratory vs. Confirmatory Factor Analysis: Theoretical Distinctions

EFA and CFA are both built upon the common factor model, which posits that observed variables are linear combinations of underlying latent factors, plus error. However, they diverge significantly in their theoretical basis, purpose, and methodology (Brown, 2015; Lloret-Segura et al., 2014).

Exploratory Factor Analysis (EFA) is, as its name suggests, an exploratory method. It is employed when researchers do not have a strong a priori theory about the number of underlying factors or how observed variables relate to them (Lloret-Segura et al., 2014; Williams, Onsman, & Brown, 2010). The primary goal of EFA is to identify the underlying factor structure in a set of measured variables by examining the intercorrelations among them. It is a data-driven approach, aiming to reduce a large number of variables into a smaller, more manageable set of latent constructs or factors (Fabrigar & Wegener, 2012). The process allows all variables to load on all factors initially, and through techniques like factor rotation, researchers aim to achieve a "simple structure" where items load strongly on only one factor, aiding interpretation (Brown, 2015).

Confirmatory Factor Analysis (CFA), in contrast, is a theory-driven method used to test a pre-specified hypothesis about the factor structure and how variables relate to specific factors (Brown, 2015). Researchers using CFA must explicitly define the number of factors, which variables load onto which factors, and whether the factors are correlated. Unlike EFA, CFA imposes constraints, typically by fixing certain factor loadings to zero, meaning a variable is hypothesized not to load on a particular factor (Bollen, 1989). The fit of this hypothesized model to the observed data is then statistically evaluated using various goodness-of-fit indices (Brown, 2015). CFA is often used in later stages of scale development to validate a structure previously identified through EFA or suggested by strong theoretical reasoning.

The EFA Procedure: Uncovering Latent Structure

The EFA procedure involves several key steps to move from observed data to a posited factor structure:

Correlation Analysis: EFA begins with the computation of a correlation matrix for all observed variables (items). This matrix shows the strength and direction of the linear relationships between pairs of items. The assumption is that variables measuring the same underlying construct will be substantially correlated (Fabrigar & Wegener, 2012; Lloret-Segura et al., 2014). The choice of correlation coefficient (e.g., Pearson for continuous data, polychoric for ordinal data) is crucial for accurate results (Lloret-Segura et al., 2014).
Factor Extraction: The next step is to extract initial factors from the correlation matrix. Several methods exist, such as Principal Axis Factoring (PAF) or Maximum Likelihood (ML). These methods aim to determine the number of common factors needed to explain the shared variance among the items (Brown, 2015; Fabrigar & Wegener, 2012). The decision on how many factors to retain is critical and can be guided by criteria such as eigenvalues (though the "eigenvalue greater than one" rule is often criticized), scree plots, parallel analysis (considered one of the more accurate methods), or the interpretability of the factor solution (Lloret-Segura et al., 2014; Williams et al., 2010).
Factor Rotation and Interpretation (Item Identification and Factor Loadings): The initial extracted factors are often difficult to interpret because items may load moderately on multiple factors. Factor rotation (e.g., Varimax for orthogonal rotation, assuming uncorrelated factors; or Promax/Oblimin for oblique rotation, allowing factors to be correlated) is applied to simplify the factor structure (Brown, 2015; Lloret-Segura et al., 2014). Factor loadings are central to this stage. A factor loading represents the correlation between an observed variable (item) and an underlying factor (assuming an orthogonal rotation; in oblique rotations, the interpretation is more complex, involving pattern and structure matrices) (Brown, 2015; Williams et al., 2010). These loadings indicate the strength and direction of this relationship, typically ranging from -1 to +1. Items with higher absolute loadings (e.g., > .30 or .40) on a particular factor are considered to be strong indicators of that factor (Lloret-Segura et al., 2014). Researchers examine the pattern of these loadings to understand what common theme or construct the items loading on a particular factor represent, thereby "identifying" and naming the factor. Items that do not load significantly on any factor or load on multiple factors (cross-loadings) may be considered for revision or removal (Fabrigar & Wegener, 2012).

The Lexical Hypothesis: Grounding Psychometrics in Language

A fascinating aspect of psychometric instrument development, particularly in personality psychology, is its foundation in the lexical hypothesis. This hypothesis posits that the most important individual differences in human transactions will come to be encoded as single terms in some or all of the world's languages (Goldberg,¹ 1990; John, Angleitner, & Ostendorf, 1988). Early proponents like Allport and Odbert (1936) painstakingly combed through dictionaries to extract thousands of terms describing personality traits. Later researchers, such as Cattell and subsequently Goldberg, used factor analysis on these trait terms (or ratings on these terms) to identify underlying personality dimensions (Goldberg, 1990). The widely known Big Five personality factors (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism) have strong roots in this lexical tradition (McCrae & Costa, 2008; Goldberg, 1990). Studies continue to explore nuances within this framework, such as the age of acquisition of personality terms, suggesting that terms related to broader, more fundamental traits might be learned earlier (Zentner & Gaskell, 2022).

This direct reliance on the existing lexicon means that many psychometric instruments are, in essence, systematically exploring the conceptual landscape already laid out by natural language.

Behaviorism, Cognitive Psychology, and Language

The prominence of language-derived constructs in psychometrics is interesting when contrasted with the historical stances of major psychological paradigms:

Behaviorism, particularly in its radical form articulated by B. F. Skinner, largely eschewed internal mental constructs. Skinner's (1957) analysis of language in Verbal Behavior treated it as operant behavior, shaped by reinforcement contingencies from the verbal community, rather than as an expression of internal cognitive states or linguistic rules (Dymond & Roche, 2009). The focus was on observable stimuli, responses, and reinforcement histories, not on the semantic content or structure of language as understood by linguists or cognitive psychologists.

The cognitive revolution, which began in the mid-20th century, marked a shift away from strict behaviorism. Influenced by figures like Noam Chomsky (1959), who famously critiqued Skinner's account of language, cognitive psychology re-centered the study of the mind and internal mental processes (Miller, 2003). Cognitive psychology views language as a core mental faculty, involving complex processes of perception, comprehension, memory, and production (Neisser, 1967). It investigates how humans acquire, represent, and use language, exploring concepts, semantic networks, and the rules governing linguistic structure (American Psychological Association, n.d.-a).

Reconciling Factor Analytic Approaches with Cognitive and Behavioral Psychology

The grounding of factor-analytically derived psychometric instruments in natural language presents a complex interface with both behavioral and cognitive psychology:

Behaviorism's focus on observable behavior and its de-emphasis on internal constructs meant it did not typically engage with trait concepts derived from everyday language in the same way psychometricians did. While verbal behavior was analyzed, the trait descriptors themselves were not central to its explanatory framework.
Cognitive psychology, with its sophisticated models of language processing, semantic memory, and concept formation, offers a different perspective. While it heavily utilizes language (e.g., in experimental stimuli, verbal reports), its primary aim is often to understand the mechanisms underlying language and thought, rather than to map the broad trait dimensions suggested by the lexicon.

There is ongoing discussion and, at times, tension regarding how trait structures derived from the lexical hypothesis fit with deeper cognitive or neurobiological models of personality and individual differences. Some scholars critique the lexical approach, arguing that it primarily captures the structure of everyday language or folk psychology rather than the causal mechanisms underlying behavior (Uher, 2013). Uher (2013) suggests a paradigm shift towards studying actual behavioral patterns and their more direct cognitive and physiological underpinnings.

However, others propose frameworks for integration. For instance, Matthews (2004) discusses a cognitive-adaptive theory of traits, attempting to link personality traits to information processing. Social-cognitive theories of personality also try to bridge the gap by focusing on cognitive and affective units (e.g., encodings, expectancies, affects, goals, self-regulatory plans) that interact with situations to produce behavior, which can be seen as the "doing" side of personality, complementing the "having" side represented by broad traits (Lapsley & Narvaez, 2004). Furthermore, theories like the Five-Factor Theory aim to offer a comprehensive model that includes biological bases and characteristic adaptations, moving beyond purely descriptive labels (McCrae & Costa, 2008).

In essence, while factor analysis provides powerful tools for structuring the concepts embedded in language, the reconciliation of these language-derived constructs with the process-oriented focus of cognitive psychology and the anti-mentalistic stance of traditional behaviorism remains an evolving area. The current trend appears to be towards developing multi-level theories that acknowledge the utility of trait descriptions while seeking to understand their grounding in more fundamental cognitive, affective, and biological processes.

References

Allport, G. W., & Odbert, H. S. (1936). Trait-names: A psycho-lexical study. Psychological Monographs, 47(1), i–171. https://doi.org/10.1037/h0093360

American Psychological Association. (n.d.-a). Cognitive psychology. In APA dictionary of psychology. Retrieved May 24, 2025, from https://dictionary.apa.org/cognitive-psychology

Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons.

Brown, T. A. (2015). Confirmatory factor analysis for applied research (2nd ed.). Guilford Press.

Chomsky, N. (1959). A review of B. F. Skinner's Verbal Behavior. Language, 35(1), 26–58. https://doi.org/10.2307/411334

Dymond, S., & Roche, B. (2009). A contemporary behavior analyst’s account of verbal behavior: A review of Verbal Behavior by B. F. Skinner. The Behavior Analyst, 32(1), 167–172. https://doi.org/10.1007/BF03392177

Fabrigar, L. R., & Wegener, D. T. (2012). Exploratory factor analysis. Oxford University Press.

Goldberg, L. R. (1990). An alternative "description of personality": The Big-Five factor structure. Journal of Personality and Social Psychology, 59(6), 1216–1229. https://doi.org/10.1037/0022-3514.59.6.1216²

John, O. P., Angleitner, A., & Ostendorf, F. (1988). The lexical approach to personality: A historical review of trait taxonomic research. European Journal of Personality,³ 2(3), 171–203. https://doi.org/10.1002/per.2410020302⁴

Lapsley, D. K., & Narvaez, D. (2004). A social-cognitive approach to the moral personality. In D. K. Lapsley & D. Narvaez (Eds.), Moral development, self, and identity⁵ (pp. 189–212). Lawrence Erlbaum Associates.⁶

Lloret-Segura, S., Ferreres-Traver, A., Hernández-Baeza, A., & Tomás-Marco, I. (2014). El análisis factorial exploratorio de los ítems: una guía práctica, revisada y actualizada. Anales de Psicología, 30(3), 1151–1169.⁷ https://doi.org/10.6018/analesps.30.3.199361

Matthews, G. (2004). Personality and information processing: A cognitive-adaptive theory of traits. Applied Cognitive Psychology, 18(5), 501–528. https://doi.org/10.1002/acp.1033

McCrae, R. R., & Costa, P. T., Jr. (2008). The five-factor theory of personality. In O. P. John, R. W. Robins, & L. A. Pervin (Eds.), Handbook of personality: Theory⁸ and research (3rd ed., pp. 159–181).⁹ Guilford Press.

Miller, G. A. (2003). The cognitive revolution: A historical perspective. Trends in Cognitive Sciences, 7(3), 141–144. https://doi.org/10.1016/S1364-6613(03)00029-9

Neisser, U. (1967). Cognitive psychology. Appleton-Century-Crofts.

Skinner, B. F. (1957). Verbal behavior. Appleton-Century-Crofts.

Uher, J. (2013). Personality Psychology: Lexical Approaches, Assessment Methods, and Trait Concepts Reveal Only Half of the Story—Why it is Time for a Paradigm¹⁰ Shift. Integrative Psychological & Behavioral Science,¹¹ 47(4), 479–513. https://doi.org/10.1007/s12124-013-9239-9

Williams, B., Onsman, A., & Brown, T. (2010). Exploratory factor analysis: A five-step guide for novices. Australasian Journal of Paramedicine,¹² 8(3). https://doi.org/10.33151/ajp.8.3.93

Zentner, M., & Gaskell, M. G. (2022). Age of acquisition of personality terms: How an early bird catches a more fundamental worm. PLoS ONE, 17(11), e0277601. https://doi.org/10.1371/journal.pone.0277601

Search

1nf0rmat10n1st