Navigating Dual Crises: Reproducibility and Theory in Psychological Science

Introduction

The field of psychological science is currently undergoing a period of profound self-reflection, driven by two interconnected challenges: the reproducibility crisis and the (more recently proposed) theory crisis. These issues, while distinct in their immediate manifestations, collectively question the foundational integrity and cumulative progress of the discipline.

Overview of the Reproducibility Crisis and its Empirical Focus

The "replication crisis," also known as the "reproducibility crisis," refers to the alarming inability of researchers to consistently reproduce or verify previously published scientific findings (Nosek et al., 2022; Wiggins & Cody, 2025). This involves both reproducibility, which means obtaining the same analytical results by re-analyzing original data, and replicability, which entails repeating an existing experiment with new, independent data to verify the original conclusions (Nosek et al., 2022; Wiggins & Cody, 2025). Both aspects are critical for the scientific method, as they ensure the reliability of empirical results and the credibility of theories built upon them (Nosek et al., 2022; Wiggins & Cody, 2025).

This crisis gained significant prominence in 2015 when the Reproducibility Project: Psychology revealed that a substantial portion of findings from psychology journals could not be reproduced (Open Science Collaboration, 2015; Wiggins & Cody, 2025). An earlier assessment suggested that only one in five studies could be trusted, a figure further underscored by the Open Science Collaboration's (2015) finding of a 36% replicability rate (Wiggins & Cody, 2025). Such failures are critical because they undermine the credibility of existing theories and cast doubt on accumulated scientific knowledge (Wiggins & Cody, 2025). The widespread awareness of failed replications has contributed to public skepticism (Wiggins & Cody, 2025).

Introduction to the Theory Crisis and its Foundational, Conceptual Nature

Beyond empirical replicability, psychological science confronts a deeper, more fundamental challenge: the "theory crisis" (Fiedler, 2017; Muthukrishna & Henrich, 2019; Oberauer & Lewandowsky, 2019). This crisis posits that many psychological theories are vaguely formulated, imprecise, and often unfalsifiable (Fiedler, 2017; Muthukrishna & Henrich, 2019; Oberauer & Lewandowsky, 2019). Paul Meehl, decades ago, criticized the field for a lack of cumulative theoretical progress, observing that theories frequently "come and go" without decisive refutation or acceptance (Meehl, 1990; Robinaugh et al., 2021). This contributes to a proliferation of overlapping and deficient theories that persist without being definitively disproven (Meehl, 1990; Robinaugh et al., 2021).



The Critical Interconnectedness of These Two Challenges

The reproducibility and theory crises are not isolated phenomena but are profoundly interdependent (Fiedler, 2017; Muthukrishna & Henrich, 2019). The theory crisis is increasingly viewed as a root cause of the empirical reproducibility issues (Fiedler, 2017; Muthukrishna & Henrich, 2019; Robinaugh et al., 2021). Deficiencies in methodological rigor, statistical analysis, and publication practices are, in part, attributed to poorly articulated theories (Fiedler, 2017). Without a sound theoretical foundation, efforts to reform empirical practices become less effective, as the resulting findings may lack clear interpretability (Fiedler, 2017).

A clear causal hierarchy emerges: weak or imprecise theory underpins methodological problems that subsequently lead to reproducibility failures. If hypotheses are vague, researchers are afforded excessive "degrees of freedom" in data collection and analysis, significantly increasing the likelihood of false positives (Fiedler, 2017; Scheel et al., 2021). This implies that purely technical reforms, such as improving statistical techniques or mandating pre-registration (Fiedler, 2017; Nosek et al., 2022; Wiggins & Cody, 2025), while important, are insufficient without addressing the fundamental theoretical imprecision (Fiedler, 2017). The concept of a "derivation chain," where theory constrains methods, analysis, and interpretation (Fiedler, 2017; Meehl, 1990), powerfully illustrates that scientific integrity flows from the quality of the theoretical framework downwards.

The Reproducibility Crisis: Methodological Flaws and Questionable Research Practices

The reproducibility crisis in psychological science is largely fueled by the prevalence of "questionable research practices" (QRPs), which inflate the likelihood of obtaining statistically significant but ultimately unreliable findings (Fiedler, 2017; Stanley et al., 2018).

Defining Reproducibility, Replicability, and their Importance

The core of the "replication crisis" lies in the inability to consistently verify published research (Nosek et al., 2022; Wiggins & Cody, 2025). Reproducibility specifically refers to achieving the same analytical results using the original data and methods (Nosek et al., 2022; Wiggins & Cody, 2025). Replicability, conversely, involves repeating an experiment with new, independent data to confirm original conclusions (Nosek et al., 2022; Wiggins & Cody, 2025). Both are crucial for the scientific method, as they ensure the reliability of empirical results and the credibility of theories built upon them (Nosek et al., 2022; Wiggins & Cody, 2025). The crisis has spurred significant efforts, particularly in psychology and medicine, to re-examine classic findings (Nosek et al., 2022; Wiggins & Cody, 2025). The failure to replicate well-known effects has highlighted the need for rigorous methodological standards (Nosek et al., 2022).

Questionable Research Practices (QRPs)

QRPs encompass a range of practices that, while not outright fraud, distort research outcomes, making findings appear more robust or significant than they are (Fiedler, 2017; Stanley et al., 2018).

  • Hypothesizing After Results are Known (HARKing): HARKing involves formulating a hypothesis after data collection and analysis, essentially reverse-engineering a prediction to fit observed results (Kerr, 1998; Wiggins & Cody, 2025). This practice fundamentally subverts the scientific method, which relies on a priori hypotheses to guide research and allow for falsification (Kerr, 1998; Wiggins & Cody, 2025). Its rise was driven by academic incentives, as hypothesis-driven research became overvalued by editors and reviewers, transforming hypotheses from a tool into a publication goal (Kerr, 1998; Wiggins & Cody, 2025). At its peak, HARKing may have been more common than genuine hypotheses (Kerr, 1998). This distortion significantly contributed to the replication crisis (Kerr, 1998; Wiggins & Cody, 2025). However, the extent of HARKing's harm is debated, with some arguing it may not always be problematic, especially if suppressed hypotheses are unrelated to the findings (Kerr, 1998; Stanley et al., 2018).

  • P-Hacking: P-hacking refers to manipulating data or analyses until a statistically significant p-value (typically below.05) is achieved (Fiedler, 2017). This can involve selective data exclusion, running multiple analyses, or adding more data until significance is reached (Fiedler, 2017). Despite widespread concern, some evidence suggests p-hacking may be less prevalent than commonly assumed (Stanley et al., 2018). For example, one study found "only a small amount of selective reporting bias" in psychology articles (Stanley et al., 2018). Moreover, its overall impact on scientific progress might be less severe than anticipated (Stanley et al., 2018).

  • Selective Reporting and Publication Bias (including binning evidence): Selective reporting involves choosing to report only a subset of measured outcomes or analyses, often those yielding statistically significant or favorable results (Ioannidis et al., 2004; Kirkham et al., 2010). This can manifest as "outcome reporting bias" (ORB) or "analysis reporting bias" (ARB) (Kirkham et al., 2010). Publication bias is a systemic issue where studies with statistically significant or "positive" findings are more likely to be published than those with null or insignificant results (Ioannidis et al., 2004; Van Aert et al., 2019). This skews the published literature, leading to an overestimation of effect sizes and misleading conclusions (Ioannidis et al., 2004; Van Aert et al., 2019). The incentive for journals to publish "interesting" findings and for researchers to publish frequently contributes to this bias (Ioannidis et al., 2004). "Binning evidence" can be a form of data manipulation within selective reporting, where data points are grouped or categorized in ways that favor desired outcomes (Abrahamyan et al., 2016; Braun et al., 2018; Hermoso-Mendizabal et al., 2020). This can also involve selectively interpreting new evidence to support existing beliefs (Abrahamyan et al., 2016; Braun et al., 2018; Hermoso-Mendizabal et al., 2020). While concerning, some research suggests publication bias may not be as "prominent as feared" or only "mild" in psychology and medicine (Brodeur et al., 2023; Van Aert et al., 2019).

Table 1: Key Questionable Research Practices (QRPs) in Psychological Science

QRP NameDefinitionMechanismImpact on ResearchKey Nuances/Debates
HARKing (Hypothesizing After Results are Known)Formulating a hypothesis after data collection and analysis to fit observed results.Reverse-engineering predictions to match findings, driven by incentive for hypothesis-driven research.Subverts scientific method, distorts scientific integrity, contributes to replication failures.Extent of harm is debated; may not always be problematic if suppressed hypotheses are unrelated to findings (Kerr, 1998; Stanley et al., 2018).
P-HackingManipulating data or analyses until a statistically significant p-value (e.g., <.05) is achieved.Selective data exclusion, running multiple analyses, adding data until significance is reached.Inflates false positives, makes findings appear more robust than they are.May be less prevalent than commonly assumed (Stanley et al., 2018); overall impact on scientific progress might be less severe than anticipated (Stanley et al., 2018).
Selective Reporting/Publication BiasReporting only a subset of measured outcomes or analyses, or publishing studies based on the nature/direction of results.Emphasizing favorable outcomes (Outcome Reporting Bias, ORB); selecting analyses for reporting (Analysis Reporting Bias, ARB); journals/researchers favoring "positive" or "significant" findings.Skews published literature, overestimates effect sizes, misleads conclusions, compromises research integrity.May not be as "prominent as feared" or only "mild" in some fields (Brodeur et al., 2023; Van Aert et al., 2019); can lead to misinformed policy and practice (Ioannidis et al., 2004).

The Broader Impact of QRPs on Scientific Credibility and Public Trust

QRPs fundamentally compromise research integrity, betraying the trust of both study participants and the broader scientific community (Ioannidis et al., 2004). They can lead to misinformed policy decisions, the misuse of resources, and even the implementation of harmful practices (Ioannidis et al., 2004). The widespread awareness of failed replications has contributed to public skepticism (Wiggins & Cody, 2025).

The pervasive nature of QRPs is not merely a collection of isolated acts of individual misconduct but rather a deeply embedded systemic issue perpetuated by the current academic reward structure. The pressure to publish, often termed the "publish or perish" culture, coupled with a strong preference for novel and statistically significant results, creates an environment where researchers may feel compelled to engage in these practices, even if unintentionally, to secure their careers (Wiggins & Cody, 2025). This dynamic aligns with Campbell's Law, which posits that when an indicator (such as a significant p-value or a novel hypothesis) becomes a target for decision-making (e.g., publication, promotion), it inevitably becomes subject to corruption (Wiggins & Cody, 2025). This understanding shifts the focus from blaming individual researchers to reforming the broader institutional and cultural landscape of science.

Furthermore, the scientific community's understanding of the prevalence and impact of QRPs is continually evolving. While the initial narrative of the reproducibility crisis heavily emphasized the widespread and detrimental impact of these practices (Nosek et al., 2022; Ioannidis et al., 2004), more recent work presents a contrasting perspective, suggesting that p-hacking and publication bias might be less prevalent or impactful than initially feared (Stanley et al., 2018; Van Aert et al., 2019). The noted difficulty researchers face in consistently p-hacking theoretically coherent results suggests that inherent scientific constraints might limit the extent of data distortion (Stanley et al., 2018). This nuance is vital for a balanced understanding, implying that while vigilance against QRPs is essential, an exclusive focus on them might divert attention from other fundamental issues, such as the underlying theoretical weaknesses. Reforms should be data-driven and tailored to the actual mechanisms and impacts of different QRPs, rather than a one-size-fits-all approach.

Methodological Reforms and Initiatives Aimed at Enhancing Reproducibility

In response, the field has initiated significant reforms. These include promoting direct replications, advocating for larger sample sizes, and encouraging the use of thoroughly validated measures (Nosek et al., 2022). Pre-registration, where hypotheses and study plans are publicly documented before data collection, is a key reform aimed at preventing HARKing and p-hacking (Nosek et al., 2022; Wiggins & Cody, 2025). Registered Reports, where journals commit to publishing studies regardless of their outcomes if the pre-registered plan is sound, further incentivize transparent practices (Nosek et al., 2022). These efforts represent a crucial "overhaul of our research culture" (Wiggins & Cody, 2025).

The Theory Crisis: Conceptual Foundations and Their Vulnerabilities

The theory crisis represents a more profound challenge to psychological science, stemming from fundamental issues in how psychological constructs are conceived, defined, and empirically investigated (Robinaugh et al., 2021).

Distinguishing the Theory Crisis as a Deeper, More Fundamental Issue than the Reproducibility Crisis

While the reproducibility crisis addresses empirical reliability, the "theory crisis" delves into the underlying conceptual and philosophical weaknesses within psychological science (Fiedler, 2017; Robinaugh et al., 2021). It posits that empirical issues often stem from poorly articulated or unfalsifiable theories (Fiedler, 2017; Robinaugh et al., 2021). Paul Meehl's long-standing critique that psychological theories lack cumulative progress, neither being decisively refuted nor universally accepted, remains pertinent (Meehl, 1990; Robinaugh et al., 2021). This leads to a landscape where numerous overlapping theories coexist without clear differentiation or falsification (Meehl, 1990; Robinaugh et al., 2021). Some scholars contend that the "theory crisis" is not a temporary anomaly but an inherent challenge due to the intrinsic complexity of psychological phenomena (Robinaugh et al., 2021; Yarkoni & Westfall, 2025).

Problems with Theoretical Premises in Setting Hypotheses: Vague and Unfalsifiable Theories

A core problem of the theory crisis is the vague and abstract formulation of psychological theories, which makes them difficult to rigorously test or falsify (Fiedler, 2017; Robinaugh et al., 2021). This imprecision results in verbal models that are overly flexible and ambiguous, hindering the derivation of clear, testable hypotheses (Fiedler, 2017). Conceptually clear theories are paramount for guiding hypothesis development and appropriate measurements (Fiedler, 2017; Meehl, 1967). When theories are vague, researchers possess excessive "degrees of freedom" in their methods and analyses, which can inflate false positives and lead to broad generalizations unsupported by empirical data (Fiedler, 2017; Scheel et al., 2021).

Challenges in Conceptualization and Operationalization of Psychological Constructs

Psychology frequently introduces new constructs and scales, often with new terms for existing concepts or applying the same term to different ones (Robinaugh et al., 2021). This points to a pervasive issue with construct validity—the extent to which a measure accurately reflects the psychological construct it purports to measure, and whether that construct is well-integrated into a theoretical framework (Robinaugh et al., 2021). Despite its critical importance, construct validity often receives insufficient attention compared to reliability in psychological research, with many articles providing limited or no validity evidence (Robinaugh et al., 2021). This leads to a proliferation of psychological constructs with uncertain validity throughout the field (Robinaugh et al., 2021).

Furthermore, establishing causal relationships between psychological variables is inherently challenging because interventions are frequently "fat-handed" (Robinaugh et al., 2021). This means they inadvertently manipulate multiple variables simultaneously rather than precisely targeting a single construct, making it difficult to isolate specific causal mechanisms and verify what an intervention truly changed (Robinaugh et al., 2021).

Case Study: The Ego Depletion Theory

  • Overview of the Theory and its Core Tenets: Ego depletion is a prominent theory in social psychology, positing that self-control or willpower relies on a finite pool of mental resources that can be exhausted through sustained use (Baumeister et al., 1998; Muraven & Baumeister, 2000; Zou & Schimmack, 2017). When these resources are low, subsequent self-control performance is impaired (Zou & Schimmack, 2017). The typical experimental design involves an initial self-control task followed by a second, unrelated self-control task, with the "depleted" group expected to perform worse (Zou & Schimmack, 2017).

  • Critiques Concerning its Conceptual Clarity and Operationalization: A significant criticism of ego depletion theory stems from the lack of a clear, consistent definition and operationalization of its central concept, "self-control" (Robinaugh et al., 2021). The diverse experimental setups used to measure or manipulate self-control have not been consistently validated, leading to ambiguity regarding what is precisely being depleted (Robinaugh et al., 2021). For instance, meta-analyses have shown variability in the effectiveness of different "depleting" tasks, with "attention video" tasks being ineffective while "emotion video" tasks were most effective, highlighting the conceptual and operational inconsistencies (Zou & Schimmack, 2017).

  • Discussion of Replication Failures and Meta-Analytic Findings Challenging its Robustness: Despite its widespread acceptance and hundreds of supporting studies, the ego depletion effect has faced substantial challenges from replication failures (Robinaugh et al., 2021). A recent meta-analysis, employing methods to account for small-studies effects, found the ego depletion effect to be "indistinguishable from zero" (Stanley et al., 2018; Zou & Schimmack, 2017). This suggests that many initial findings may have been influenced by publication bias or other QRPs, and that the underlying phenomenon itself may lack robustness. The absence of a consistently defined and robust phenomenon inherently hinders the development of a strong and testable theory (Robinaugh et al., 2021).

This case study illustrates a critical, often overlooked, prerequisite for building strong theories: the existence of stable, reproducible empirical observations. Without these foundational phenomena, theories are "underdetermined by evidence" (Robinaugh et al., 2021), meaning multiple, potentially conflicting theories can attempt to explain the same shaky data. This makes decisive falsification difficult and impedes the accumulation of cumulative knowledge. The field's historical focus on "effects" rather than underlying "psychological capacities" (Meehl, 1978; Robinaugh et al., 2021) might contribute to this gap, as effects can be transient or context-dependent, whereas capacities are more fundamental and enduring.

The pervasive "problems of validity of psychological constructs" further compound the theory crisis. The constant introduction of new terms and the insufficient attention paid to construct validity (Robinaugh et al., 2021) lead to a situation where concepts become deeply embedded due to their reliance by other concepts, theories, or societal practices. This phenomenon, termed "generative entrenchment," makes them extremely difficult to alter or discard. The example of Major Depressive Disorder (MDD) highlights how a problematic definition can persist despite its scientific shortcomings due to its entrenchment within diagnostic systems and clinical practice (Robinaugh et al., 2021). This provides a powerful explanation for why problematic concepts or theories (such as the ambiguously defined "self-control" in ego depletion, or broad MDD definitions) can persist in the field despite mounting empirical challenges. This phenomenon acts as a significant barrier to theoretical progress and perpetuates the use of ill-defined constructs.

Finally, a fundamental methodological and epistemological challenge in psychological experimentation is the "fat-handed intervention" problem. Establishing causal relationships between psychological variables is difficult because interventions are often "fat-handed," meaning they inadvertently manipulate multiple variables simultaneously (Robinaugh et al., 2021). The ego-depletion experiments serve as an example, where manipulations intended to affect "self-control" might also influence other factors like motivation or anger, making it unclear whether diminished self-control is the sole or primary cause of impaired performance (Robinaugh et al., 2021). If a theory posits a causal link between variable A and outcome B, but the intervention targeting A also impacts C and D (which could also influence B), then the causal claim remains ambiguous and difficult to verify empirically. This limits the ability to build theories that accurately capture causal mechanisms of the mind.

Implications for Psychological Science and Future Directions

Addressing the dual crises necessitates a comprehensive re-evaluation of research practices, theoretical development, and the very philosophy of psychological inquiry.

The Role and Limitations of Evidence Hierarchies (e.g., GRADE) in Evaluating Psychological Research

Evidence hierarchies, such as the Grading of Recommendations Assessment, Development and Evaluation (GRADE) system, are widely used heuristic tools for ranking the strength of research evidence, particularly in medicine (Kavanagh, 2009; Stegenga, 2014). They typically place systematic reviews and meta-analyses of randomized controlled trials (RCTs) at the highest level, followed by individual RCTs, cohort studies, and case-control studies, with expert opinion at the lowest tier (Kavanagh, 2009; Stegenga, 2014). GRADE rates evidence quality (certainty in effect estimates) as high, moderate, low, or very low (Jüni et al., 2025; Kavanagh, 2009).

While applicable beyond medicine (Jüni et al., 2025; Stegenga, 2014), these hierarchies face significant criticisms. They may limit the utility of research for individual patient care and often overlook research on safety and efficacy (Blunt, 2015; Stegenga, 2011). Critics argue they fail to adequately define key terms, properly weigh non-randomized controlled trials, and account for study design limitations (Blunt, 2015; Stegenga, 2011). Some scholars, like Jacob Stegenga (2011), question the automatic placement of meta-analyses or RCTs at the top, arguing that they may not always be the most appropriate or informative (Stegenga, 2011). Christopher Blunt (2015) concluded that even modest interpretations of hierarchies are too weak for clinical practice, as they omit crucial clinical information, suggesting individual study appraisal is often more appropriate (Blunt, 2015).

In psychology, the direct transferability of medical hierarchies is debated, as many research questions cannot be ethically or practically addressed through RCTs (Blunt, 2015; Concato, 2004). This is particularly evident in qualitative-focused approaches like narrative therapy, which often lack quantitative evidence and may even express philosophical opposition to such methods (Aas et al., 2020; France & Uhlin, 2006).

Table 2: Levels of Evidence Hierarchy in Psychological Research (Adapted from GRADE)

Level of Evidence (Certainty)Corresponding Study Type(s)Applicability/Limitations in Psychology
HighSystematic Reviews of RCTs; Individual RCTs with definitive resultsOften difficult to achieve due to ethical/practical constraints; generalizability concerns for highly controlled settings; may not capture complex, individualized human experience.
ModerateRCTs with non-definitive results; High-quality Cohort StudiesUseful for exploring associations and long-term outcomes; still faces challenges in controlling all confounding variables inherent to psychological phenomena.
LowCase-Control Studies; Cross-sectional Surveys; Low-quality RCTsProne to various biases (e.g., recall bias); provides correlational data, not causal; useful for hypothesis generation or studying rare conditions.
Very LowCase Reports; Case Series; Expert Opinion; Basic Science/First PrinciplesProvides preliminary data and generates new ideas; highly susceptible to bias and lacks generalizability; forms the foundation for interpreting other evidence but is not evidence itself.

Challenges in Generalizability and External Validity of Research Findings

A critical limitation in applying research findings, particularly from evidence-based practices (EBPs), to real-world clinical settings is the issue of generalizability or external validity (Al-Jundi & Sakka, 2017; Horigian et al., 2017). Research samples in highly controlled trials often do not adequately represent the diversity of actual patient populations, including minority groups or individuals with co-occurring conditions (Horigian et al., 2017). This can render EBP interventions less effective for complex cases encountered in practice (Horigian et al., 2017). The use of stringent exclusion criteria in studies, designed to minimize confounding variables, creates artificial patient cohorts that do not reflect the multifaceted nature of real-world psychological issues (Al-Jundi & Sakka, 2017; Ercikan & Roth, 2014; Horigian et al., 2017). This limits the external validity—the extent to which study results can be applied to patients outside the original study population (Al-Jundi & Sakka, 2017).

For example, mindfulness-based interventions (MBIs) frequently demonstrate benefits over passive control groups but often show no significant superiority over active controls, suggesting that observed effects might be due to general intervention factors rather than specific mindfulness components (Van Dam et al., 2018). Furthermore, MBI studies often lack detailed reporting of participant demographics (e.g., socioeconomic status), long-term follow-up assessments, and objective "real-world" outcomes like academic grades or discipline referrals, further hindering their generalizability to diverse youth populations and educational settings (Felver et al., 2015).

This situation reveals a significant "evidence-practice gap" where the most rigorously "proven" interventions (based on traditional hierarchies) may be least applicable to the complex, multimorbid patients typically seen in clinical practice (Horigian et al., 2017). This disconnect is exacerbated by the difficulty in scientifically validating holistic, patient-centered approaches that are valued clinically but struggle with traditional research methodologies. The tension arises because the scientific paradigm often prioritizes internal validity and generalizability to a theoretical population, while clinical practice demands external validity and applicability to diverse, individual patients (Ercikan & Roth, 2014; Horigian et al., 2017).

This points to a fundamental epistemological and practical tension within psychological science. The demands of traditional scientific rigor (often favoring quantifiable, replicable, and decontextualized findings) frequently clash with the realities of clinical practice (which are inherently complex, individualized, and context-dependent). Therapies that prioritize subjective meaning-making and the client's unique narrative, while valued clinically, may struggle to produce the type of "evidence" favored by conventional hierarchies. The "theory crisis" further underscores that the existing scientific tools and frameworks may be inadequate for fully capturing and explaining the rich, complex phenomena of interest in psychology. This calls for a broader understanding of what constitutes valid "evidence" in a field dealing with human experience.

The Indispensable Role of Robust, Well-Defined Theory in Guiding Research and Fostering Cumulative Knowledge

Theory is fundamental to the practice of mental health professions, serving as a "roadmap" that guides psychologists in understanding clients, their problems, and developing effective solutions (Ginter et al., 2018). It provides an essential framework for understanding and intervention, shaping how clinicians perceive and address behaviors, feelings, and thoughts (Ginter et al., 2018). Well-formed theory is critical for the scientific process, directly informing the development of testable hypotheses and appropriate measurements (Fiedler, 2017). Research grounded in robust theory is more likely to replicate successfully, potentially because it increases the a priori probability of derived hypotheses being true and reduces researcher degrees of freedom that can lead to false positives (Fiedler, 2017; Scheel et al., 2021). Fundamentally, theory is essential for scientific research because "without it, there would be nothing to test" (Meehl, 1967). It provides the necessary generalizations that clarify understanding and enable the creation of cumulative knowledge (Meehl, 1967). Without theoretical guidance, interventions would rely solely on subjective clinical observations, lacking objective means to systematically evaluate their efficacy (Meehl, 1967).

Recommendations for Advancing Psychological Science: Emphasizing Phenomena Detection, Construct Validation, and Precise Causal Inference

To effectively address the theory crisis, psychological science must prioritize "phenomena detection" or "phenomenon-driven research" (Robinaugh et al., 2021). This involves rigorously identifying and documenting stable, reproducible empirical regularities that can serve as foundational constraints for theory development (Robinaugh et al., 2021). Strengthening the conceptual basis of psychological theories is paramount, requiring clearly and transparently defined concepts (Robinaugh et al., 2021). This necessitates an iterative and ongoing process of conceptual clarification and rigorous construct validation (Robinaugh et al., 2021). Furthermore, addressing the challenges in causal inference is vital. This may involve developing more precise and targeted interventions to avoid "fat-handed" manipulations that obscure causal mechanisms, or by exploring and embracing non-causal theoretical approaches when precise causal inference is unattainable (Robinaugh et al., 2021).

Conclusion: Towards a More Robust and Coherent Psychological Science

The reproducibility crisis and the theory crisis represent a critical juncture for psychological science. While the former highlights issues of empirical reliability and methodological rigor, the latter points to deeper, more fundamental challenges in conceptualization and theoretical development. These crises are intimately linked, with theoretical imprecision often underlying empirical fragility and contributing to the difficulty in replicating findings.

Moving forward, addressing these dual crises requires a multifaceted and integrated approach. This includes:

  • Continued Methodological Reforms: Sustaining and expanding initiatives like pre-registration, encouraging larger sample sizes, and promoting transparent reporting practices are essential for enhancing empirical reproducibility.
  • A Renewed Focus on Rigorous Theory Building: Prioritizing the identification of robust empirical regularities, investing in rigorous construct validation, and fostering clearer conceptual definitions are crucial for developing more precise and falsifiable theories.
  • A Critical Examination of Academic Incentive Structures: Reforming the "publish or perish" culture and valuing diverse contributions, including replication studies and theoretical advancements, is necessary to mitigate the pressures that foster questionable research practices.
  • A More Nuanced Understanding of Evidence: Acknowledging the value of diverse methodologies, including qualitative and mixed-methods approaches, is vital for capturing the complexity of psychological phenomena. This also involves a more sophisticated understanding of generalizability, recognizing the inherent challenges of translating research findings to diverse real-world clinical contexts.

By embracing these challenges as opportunities for growth and evolution, psychological science can move towards a future that is not only empirically sound but also theoretically coherent, clinically relevant, and deserving of greater public trust and confidence.

References

Aas, M., Ulvenes, P. G., & Røssberg, J. I. (2020). Narrative Family Therapy for Children and Adolescents with Psychiatric Disorders: A Pilot Study. Frontiers in Psychiatry, 11, 570383. https://pmc.ncbi.nlm.nih.gov/articles/PMC7703837/

Abrahamyan, A., Dakin, S. C., & Pollick, F. E. (2016). Adaptive history biases in human perceptual decisions. Journal of Vision, 16(14), 1–16.

Al-Jundi, A., & Sakka, S. (2017). Internal and external validity. Journal of Orthodontic Science, 6(4), 105–107. https://pmc.ncbi.nlm.nih.gov/articles/PMC6188693/

Baumeister, R. F., Bratslavsky, E., Muraven, M., & Tice, D. M. (1998). Ego depletion: Is the active self a limited resource? Journal of Personality and Social Psychology, 74(5), 1252–1265.

Blunt, C. J. (2015). The Epistemology of Evidence-Based Medicine: A Philosophical Analysis of the Hierarchy of Evidence (Doctoral dissertation, University of Bristol).

Borrell-Carrió, F., Suchman, A. L., & Epstein, R. M. (2004). The Biopsychosocial Model 25 Years Later: Principles, Practice, and Scientific Inquiry. Annals of Family Medicine, 2(6), 576–582. https://pmc.ncbi.nlm.nih.gov/articles/PMC1466742/

Braun, D. A., Urai, A. E., & Donner, T. H. (2018). Adaptive history biases in human perceptual decisions. Nature Communications, 9(1), 1–10.

Brodeur, A., Lé, A., & Sangnier, M. (2023). Publication bias in economics: An empirical analysis. Journal of Economic Surveys, 37(5), 2974–3000.

Cashin, A., et al. (2022). A systematic review of narrative therapy treatment outcomes for eating disorders—bridging the divide between practice-based evidence and evidence-based practice. BMC Psychiatry, 22(1), 1–15. https://pmc.ncbi.nlm.nih.gov/articles/PMC9469550/

Concato, J. (2004). Observational Studies in Medical Research. New England Journal of Medicine, 350(14), 1459–1460.

Denborough, D. (2012). Retelling the stories of our lives: Everyday narrative practice with young people and their families. Dulwich Centre Publications.

Denborough, D. (2014). Retelling the stories of our lives: Everyday narrative practice with young people and their families. Dulwich Centre Publications.

Doan, R. E. (1998). The King Is Dead; Long Live the King: Narrative Therapy and Practicing What We Preach. Journal of Family Therapy, 20(1), 1–16.

Ercikan, K., & Roth, W.-M. (2014). Limits of Generalizing in Education Research: Why Criteria for Research Generalization Should Include Population Heterogeneity and Uses of Knowledge Claims. Teachers College Record, 116(5). https://eric.ed.gov/?id=EJ1020352

Felver, J. C., Celis-de Hoyos, C. E., Tezanos, K. M., & Singh, N. N. (2015). A Systematic Review of Mindfulness-Based Interventions for Youth in School Settings. Mindfulness, 6(2), 346–358. https://www.researchgate.net/publication/273349460_A_Systematic_Review_of_Mindfulness-Based_Interventions_for_Youth_in_School_Settings

Fiedler, K. (2017). What constitutes a good psychological theory? The case of the ego depletion effect. Perspectives on Psychological Science, 12(4), 586–613.

France, C. M., & Uhlin, B. D. (2006). Narrative therapy and evidence-based practice: A critical dialogue. Journal of Systemic Therapies, 25(1), 1–16.

Ginter, E. J., Roysircar, G., & Gerstein, L. H. (2018). Theories and Applications of Counseling and Psychotherapy: Relevance Across Cultures and Settings. SAGE Publications, Inc.

Hayward, M. (2003). Critiques of Narrative Therapy: A Personal Response. Australian and New Zealand Journal of Family Therapy, 24(4), 213–219. https://www.researchgate.net/publication/242785152_Critiques_of_Narrative_Therapy_A_Personal_Response

Hermoso-Mendizabal, A., Fontanini, A., & Mainen, Z. F. (2020). Adaptive history biases in human perceptual decisions. Nature Neuroscience, 23(10), 1250–1258.

Horigian, V. E., Robbins, M. S., Dominguez, L., Ucha, J., & Rosa, C. L. (2017). Evidence-Based Psychotherapy: Advantages and Challenges. Psychiatric Clinics of North America, 40(3), 395–407. https://pmc.ncbi.nlm.nih.gov/articles/PMC5509639/

Ioannidis, J. P. A., Evans, S. J. W., Gøtzsche, P. C., O'Neill, R. T., Schulz, K., Altman, D. G., & Moher, D. (2004). Better reporting of harms in randomized trials: An extension of the CONSORT statement. Annals of Internal Medicine, 141(10), 781–788.

Jüni, P., Antoniou, S., Arbelo, E., Buccheri, S., da Costa, B. R., Fauchier, L., Gale, C. P., Halvorsen, S., & James, S. (2025). 2024 Revision of the level of evidence grading system for ESC clinical practice guideline recommendations I: therapy and prevention. European Heart Journal. https://academic.oup.com/eurheartj/advance-article/8089777

Kavanagh, B. P. (2009). The GRADE System for Rating Clinical Guidelines. PLoS Medicine, 6(9), e1000094. https://pmc.ncbi.nlm.nih.gov/articles/PMC2735782/

Kerr, N. L. (1998). HARKing: Hypothesizing After the Results are Known. Personality and Social Psychology Review, 2(3), 196–211.

Kirkham, J. J., Dwan, K., Altman, D. G., Gamble, C., Dodd, S., Smyth, R., & Williamson, P. R. (2010). The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ, 340, c365.

Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834.

Meehl, P. E. (1990). Why are theories in psychology so bad? Journal of Theoretical and Philosophical Psychology, 10(2), 157–167.

Minuchin, S. (1998). Where is the Family in Family Therapy? Journal of Marital and Family Therapy, 24(4), 397–402.

Muraven, M., & Baumeister, R. F. (2000). Self-regulation and depletion of limited resources: Does self-control resemble a muscle? Psychological Bulletin, 126(2), 247–259.

Muthukrishna, M., & Henrich, J. (2019). A problem of theory. Nature Human Behaviour, 3(3), 221–223.

Nosek, B. A., Ebersole, C. R., DeHaven, K. L., & Mellor, D. T. (2022). The replication crisis and psychology's response. Annual Review of Psychology, 73, 501–525.

Oberauer, K., & Lewandowsky, S. (2019). Addressing the theory crisis in psychology. Psychological Review, 126(4), 581–598.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

Robinaugh, D. J., Haslbeck, J. M. B., Waldorp, L. J., & Borsboom, D. (2021). The theory crisis in psychology: How we got here and how to get out. Psychological Review, 128(3), 421–440. https://pmc.ncbi.nlm.nih.gov/articles/PMC8273366/

Scheel, A. M., Tiokhin, L. S., Isager, P. M., & Lakens, D. (2021). Why hypotheses are not predictions, and why theory is not a hypothesis: A plea for conceptual clarity in psychological science. Psychological Review, 128(6), 1121–1132.

Spencer, T. D., & Petersen, D. B. (2020). Narrative Intervention: Principles to Practice. Language, Speech, and Hearing Services in Schools, 51(4), 1081–1096. https://pubs.asha.org/doi/10.1044/2020_LSHSS-20-00015

Stanley, T. D., Carter, E. C., & Doucouliagos, H. (2018). What meta-analyses reveal about the replication crisis in psychology. Perspectives on Psychological Science, 13(6), 1326–1339.

Stegenga, J. (2011). Is meta-analysis the platinum standard of evidence? Studies in History and Philosophy of Science Part C: Studies in History and Philosophy of Biological and Biomedical Sciences, 42(4), 493–507.

Stegenga, J. (2014). Care and Cure: An Introduction to Philosophy of Medicine. University of Chicago Press.

Van Aert, R. C. M., Wicherts, J. M., & van der Sluis, S. (2019). The prevalence of publication bias in psychology: A meta-analysis. Psychological Bulletin, 145(10), 1011–1034.

Van Dam, N. T., van Vugt, M. K., Vago, D. R., Schmalzl, L., Saron, A. A., Olendzki, A.,... & Meyer, D. E. (2018). Has the science of mindfulness lost its mind?. Perspectives on Psychological Science, 13(2), 244–248. https://pmc.ncbi.nlm.nih.gov/articles/PMC5353526/

White, M., & Epston, D. (1990). Narrative means to therapeutic ends. W. W. Norton & Company.

Wiggins, B. J., & Cody, D. (2025). An Overview for Theoretical and Philosophical Psychology. Journal of Theoretical and Philosophical Psychology.

Yarkoni, T., & Westfall, J. (2025). There is no theory crisis in psychological science. Journal of Theoretical and Philosophical Psychology.

Zhang, D., Lee, E. K. P., Mak, E. C. W., Ho, C. Y., & Wong, S. Y. S. (2025). Mindfulness-based interventions: an overall review. British Medical Bulletin, 138(1), 41–52. https://academic.oup.com/bmb/article-abstract/138/1/41/6244773

Zou, X., & Schimmack, U. (2017). An updated meta-analysis of the ego depletion effect. Journal of Experimental Psychology: General, 146(10), 1450–1464. https://pmc.ncbi.nlm.nih.gov/articles/PMC6013521/

Comments