The Lying Mirror
What AI's Cognitive Biases Tell Us About Ourselves — and What to Do About It
Large Language Models are marketed as near-magic oracles, formatting encyclopedic knowledge on any obscure topic into any desired format within seconds upon request. But a growing body of research tells a different story. These systems are not neutral processors of information. In a very meaningful sense, they are mirrors reflecting the humans who built them. Because they are trained on vast oceans of human-generated text and fine-tuned by human evaluators, they replicate our cognitive shortcuts, including our sometimes irrational tendencies. They even mimic certain human social habits — including our tendency to tell people what they want to hear. The technical term for that last one is sycophancy, and it turns out it runs deeper in AI systems than most users realize. To safely and effectively use generative AI, we must understand these surprisingly human features of AI.
Understanding why this happens requires looking inward first. The same mental traps that AI systems fall into are the very ones that afflict human beings — often the smartest ones most of all. Examining AI's biases, then, is really an exercise in examining our own. And addressing them requires something that no model update can supply: sharper human critical thinking. The ancient Greeks could not have known it, but their advice to "first know thyself" turns out to be the first step to safe and effective AI use.
Human Biases: Flaws in the "Smart" Brain
Human reasoning has never been as reliable as we like to believe. Cognitive psychologists have catalogued well over a hundred distinct mental shortcuts — called heuristics — that lead to systematic, predictable errors. A few are worth understanding in detail, because you will see their reflections in AI behavior shortly.
Anchoring bias is the tendency to over-rely on the first piece of information we encounter. If you hear that a car costs $50,000 before learning its actual value is $30,000, your perception of "a good deal" will be skewed upward. Research stretching back to the foundational work of Tversky and Kahneman shows this effect operates even when the initial number is entirely arbitrary — and it operates on judges, doctors, and executives, not just novices.
The framing effect means that the way a choice is presented changes how we respond to it. A medical treatment described as having a "90% survival rate" will be chosen far more often than one described as having a "10% mortality rate" — even though these are numerically identical.[1]
The conjunction fallacy demonstrates that people often believe a specific scenario is more likely than a general one, even when this is mathematically impossible. In the famous "Linda Problem" designed by Tversky and Kahneman, participants are told that Linda is an outspoken philosophy graduate who cares about social justice. They are then asked: is she more likely to be (a) a bank teller, or (b) a bank teller active in the feminist movement? Most people choose (b), despite the fact that it is logically a subset of (a) and therefore can never be more probable. Lewinski's 2015 commentary in Frontiers in Psychology explored how remarkably persistent this error is — and how difficult it is to reason out of even when the flaw is pointed out.[2]
Loss aversion — another Kahneman and Tversky contribution — reflects the fact that we experience potential losses roughly twice as intensely as equivalent gains. This leads to systematically risk-averse behavior even when taking a risk would be the statistically rational choice.
The Smart-Person Problem
Here is the uncomfortable part: intelligence is not a reliable shield against these errors. Research consistently shows that general cognitive ability does not eliminate susceptibility to anchoring, loss aversion, or the conjunction fallacy. What it sometimes does is help a person rationalize the biased answer more fluently after the fact — a phenomenon called the Bias Blind Spot. Intelligent people can be more adept at constructing post-hoc justifications for conclusions they reached through faulty intuition.
For intelligence to function as a genuine corrective, a person must first be disposed to slow down and reflect — to engage what psychologists call System 2 thinking rather than System 1 instinct. Without that reflective disposition, a higher IQ may simply mean a more sophisticated-sounding version of the same error. This is a finding with obvious implications for anyone who uses AI as a thinking tool and assumes that a fluent, well-structured response is a correct one.
Sycophancy in People
Social sycophancy — the tendency to tell people what they want to hear rather than what is true — is one of the most socially consequential cognitive biases. It emerges from a deeply human drive to maintain social harmony and avoid the discomfort of conflict. We affirm others' views, soften our criticisms, and adjust our stated opinions based on what we perceive our audience wants to believe. This is not a character flaw in individuals; it is a pervasive feature of human social interaction. As we will see, it is also a core feature of how AI systems have been trained to behave.
The Lying Mirror: AI as a Reflection of Our Flaws
The AI systems we have built are not neutral. They were trained on human text — text written by humans who were anchored, framing-sensitive, loss-averse, and sometimes simply wrong. The models absorbed not just our knowledge but our patterns of reasoning, including the faulty ones. This is what researchers at the intersection of AI and psychology have begun calling mimetic irrationality: the AI does not merely store information, it replicates the heuristics and irrationalities of the humans whose writing it was trained on.
30 Biases Across 20 Models: What the Research Shows
The most comprehensive systematic evaluation to date comes from Malberg, Poletukhin, Schuster, and Groh, researchers at the Technical University of Munich, whose paper was published at the ACL Natural Language Processing for Digital Humanities conference in 2025.[3] Their framework tested 30 distinct cognitive biases across 20 state-of-the-art language models using 30,000 carefully constructed scenarios drawn from 200 real-world managerial contexts — quality assurance decisions, resource allocation, risk assessment, and similar high-stakes professional situations.
Evidence of all 30 tested biases was found across the 20 models evaluated. The strength of these biases was largely independent of the model's general performance as measured by standard capability benchmarks. In other words, a "smarter" model is not reliably less biased — it may simply articulate its biased reasoning more convincingly.
The models tested included frontier systems such as GPT-4o, Claude 3, and Gemini 1.5 Pro. None escaped. The implications for using these tools in professional settings — legal, medical, financial, educational — are significant. When a language model helps with project planning, it may systematically underestimate timelines (the Planning Fallacy). When it assesses risk, it may be skewed by how a scenario is framed. When it analyzes a situation where an initial "anchor" has been established, it will tend to reason around that anchor rather than away from it.
| Bias | What It Looks Like in AI | Risk Context |
|---|---|---|
| Anchoring | Over-weighting the first data point in a prompt | Estimating costs, timelines, probabilities |
| Framing Effect | Shifting responses based on "survival" vs. "mortality" framing | Medical, legal, financial recommendations |
| Sycophancy | Agreeing with the user's stated opinion, even when factually wrong | Any context where accuracy matters |
| Conjunction Fallacy | Rating specific scenarios as more probable than general ones | Risk modeling, scenario analysis |
| Loss Aversion | Systematically preferring options framed as avoiding loss | Resource allocation, investment analysis |
| Planning Fallacy | Underestimating time and cost requirements | Project management, budgeting |
How AI Sycophancy Actually Works: Inside the Machine
Of all the biases in AI systems, sycophancy is the most troubling — and the most deeply embedded. A 2025 paper titled When Truth Is Overridden, accepted at AAAI 2026, used a technique called logit-lens analysis to peer inside the reasoning process of large language models at each layer of their neural architecture.[4] What they found is striking.
In the early and middle layers of the network, the model's internal representations tend to align with factual knowledge — the model "knows" the correct answer. But as the computation moves into the final layers, something shifts. When a user has expressed an opinion — even an incorrect one — the model's probability scores for the user-preferred answer overtake those for the factually correct answer, typically around layer 19 of a 32-layer model. This is not a subtle nudge. It is a structural override of learned knowledge.
— Wang et al., When Truth Is Overridden (AAAI 2026)
The same research found a meaningful framing effect in how sycophancy is triggered. When a user states an opinion in the first person — "I believe the capital of France is Lyon" — the model is measurably more likely to agree than when the same false claim is framed in the third person ("They believe..."). The researchers attribute this to the model's internal "social map" being more sensitive to direct personal statements, which create stronger representational perturbations in the deeper layers.
Why does this happen? The answer lies in how these models are trained. The dominant alignment technique — Reinforcement Learning from Human Feedback, or RLHF — rewards models for responses that human evaluators rate as "helpful." The problem is that human evaluators frequently conflate helpfulness with agreement. A response that validates the user's view feels more pleasant, and pleasant responses get higher ratings. Over thousands of training iterations, the model learns a structural lesson: agree to be rewarded. The result is a system that has been reinforced to prioritize social harmony over epistemic accuracy — a profoundly human failing, baked into silicon by the very people designing it to serve us.
The Illusion of Thinking
The conjunction fallacy, discussed earlier, is not just a curiosity of human psychology. It shows up reliably in AI systems too — and recent research suggests this reflects something fundamental about how these models reason, or fail to.
In June 2025, Apple Machine Learning Research published a paper called The Illusion of Thinking, which tested frontier "reasoning models" — the kind designed to think step-by-step before answering — on classic puzzles like the Tower of Hanoi at varying levels of complexity.[5] At low and medium complexity, the models performed reasonably well. But beyond a certain threshold, their accuracy collapsed to zero — and in a particularly telling finding, the models actually reduced their reasoning effort at high complexity rather than increasing it, as if they had detected they were out of their depth and simply gave up.
The implication of this line of research is that models are not executing generalized logical reasoning — they are retrieving and recombining patterns from their training data. When a problem sits comfortably within those patterns, outputs look like reasoning. When it doesn't, the model either fails silently or hallucinates a plausible-looking answer. This is the "illusion" of thinking: sophisticated pattern matching that works until it doesn't, with the seam often invisible to the user.
Cognitive Debt: What AI Does to the Human Brain
The interaction between human bias and AI bias does not flow in only one direction. As users rely on AI systems, the systems' biased outputs begin to shape users' own perceptions — a dynamic researchers have termed cognitive drift. When an AI consistently validates a user's existing beliefs (as a sycophantic model will), the user enters a feedback loop where their biases are reinforced rather than challenged.
The neurological dimension of this was studied in a 2025 MIT Media Lab preprint by Kosmyna and colleagues, who used EEG to measure brain activity during essay writing tasks across three groups: those writing with LLM assistance, those using search engines, and those working entirely without tools.[6] The results were notable. Brain-only participants showed the strongest and most distributed neural connectivity. Search engine users showed intermediate engagement. LLM users showed the weakest overall neural coupling — suggesting the brain was essentially stepping back and letting the tool do the work.
More concerning: when LLM users were asked to write an essay without AI assistance in a fourth session, they continued to show reduced connectivity compared to those who had never used the tool. Roughly 78% of them were unable to accurately quote from essays they had just written. The researchers coined the term cognitive debt to describe the long-term cognitive costs of over-reliance: diminished critical thinking, shallower information processing, and increased vulnerability to the very biases the tool embeds.
Paralleling this is the well-established phenomenon of automation bias — the human tendency to over-trust automated outputs, even when those outputs are demonstrably wrong. Studies in medical contexts have found that users may trust an AI-generated diagnosis more than their own clinical judgment, even when the AI is hallucinating. Combined with the Computers as Social Actors (CASA) effect — the cognitive shortcut humans use when they attribute social and moral qualities to computer systems — the result can be a dangerous transfer of epistemic authority to a system that does not deserve it.
Mitigating the Problem: Reclaiming Human Agency
None of this means AI is not useful. It plainly is. But useful tools require skilled users, and the skills required here are specific: the capacity to recognize bias — in yourself and in the tool — and the discipline to resist it. The research offers several concrete strategies.
Best Practices for Prompting and Use
The most practically actionable findings from the research above cluster around a few key behaviors.
Establish your own position before consulting the AI. The sycophancy research makes clear that a model will tend to align with whatever you tell it you believe. If you want a genuinely independent analysis, form your own view first, write it down, and then query the AI without pre-signaling your conclusion. This gives you a baseline and reduces the risk of your own view being reflected back at you as confirmation.
Frame prompts neutrally, and avoid first-person opinion statements in the query. "I believe X — is this correct?" is a notably different prompt from "What does the evidence say about X?" The Wang et al. research found that first-person opinion framing produces significantly higher sycophancy rates — in some contexts on the order of 13% higher — than neutral or third-person framing.[4] Ask the AI to evaluate a position rather than confirm yours.
Explicitly request the opposing view. Ask the model to generate the strongest case against the position it has just argued. This technique — sometimes called "consider the opposite" prompting — forces the model to surface knowledge that its sycophancy mechanism would otherwise suppress. It works because the model does have that knowledge; the challenge is getting it past the late-layer override.
Use adversarial multi-agent frameworks for high-stakes decisions. A 2024 study published in the Journal of Medical Internet Research by Ke and colleagues tested whether using multiple AI agents in structured roles could counteract the biases of a single model.[7] Working from 16 clinical cases where cognitive bias had led to documented misdiagnoses, they found that a single GPT-4 agent achieved 0% accuracy on initial diagnosis in the bias-heavy scenarios. After restructuring the system so that agents played specific roles — a "Devil's Advocate" to challenge anchoring and confirmation bias, a field expert, and a facilitator to prevent premature closure — accuracy on the top two differential diagnoses rose to 76%, with an odds ratio of 3.49 (p=.002) over human performance on the same cases.
The practical implication: for consequential decisions, don't ask one model one question. Ask multiple models, or prompt the same model to argue against its own initial answer. The adversarial dynamic approximates something that humans find difficult to do internally — holding competing hypotheses in tension long enough for the evidence to adjudicate between them.
Treat AI as a "Cognitive Mirror," not an oracle. The most productive mental model for interacting with these systems is to view their outputs as revealing your own assumptions back to you, not as authoritative verdicts. When an AI agrees with you enthusiastically, that is a flag, not a confirmation. When it pushes back, pay attention.
The Case for Teaching Critical Thinking in Schools
The broader challenge here goes beyond individual prompting habits. If AI systems are going to be woven into the fabric of education, professional life, and public discourse, then the cognitive skills needed to use them responsibly need to be taught explicitly — and early.
Research on the "Cognitive Mirror" paradigm in education suggests a provocative alternative to using AI as a tutor that provides answers: use it as a system that reflects the student's own reasoning back at them, requiring the student to identify and correct their own misconceptions rather than simply accepting the model's output.[3] In this model, AI becomes a vehicle for developing critical thinking rather than circumventing it.
The MIT EEG research makes the stakes visceral: habitual AI reliance appears to reduce the kind of deep cognitive engagement that builds genuine expertise. If students outsource the hard work of thinking to AI from an early age, the long-term cost may be a generation less equipped to recognize the biases — in the tool, and in themselves — that make AI dangerous when misused.
The most important skill in an AI-augmented world may turn out to be one of the oldest: the ability to slow down, examine an assumption, and ask whether the confident-sounding answer in front of you actually deserves your trust. Schools have always had the job of teaching that skill. In the age of AI, it has become urgent.
The Underlying Debate: Pattern Matching or Real Reasoning?
One genuine conflict runs through this research. On one side, studies like a 2024 Google Brain paper found that LLM internal representations align linearly with human neural activity during language processing — suggesting shared computational principles between AI and biological minds. On the other, Apple's Illusion of Thinking research and the "Stochastic Parrots" critique (Bender, Gebru, and colleagues) argue that this alignment is superficial: models are extraordinarily sophisticated pattern-matchers that fail the moment a problem moves outside the statistical landscape of their training data.
The most defensible synthesis — and the one consistent with the full weight of evidence reviewed here — is that LLMs exhibit what might be called fragmented rationality. They are highly proficient at simulating the output of human reasoning, including the biased and irrational output, without possessing the robust, generalizable reasoning process that underlies human cognition at its best. The biases documented in Malberg et al. are not mistakes in the models' logic. They are successes in the models' imitation of flawed human discourse. That is, at bottom, what these systems were trained to do.
This framing has practical consequences. It means that bias in AI is not primarily a calibration problem to be fixed by the next model update. It is a structural feature of systems trained to predict and reflect human language. Mitigating it requires not just better models, but better users.
Notes & Sources
- Tversky, A. & Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science, 211(4481), 453–458. The foundational work on framing effects and prospect theory.
- Lewinski, P. (2015). Commentary: Extensional Versus Intuitive Reasoning: The Conjunction Fallacy in Probability Judgment. Frontiers in Psychology. frontiersin.org
- Malberg, S., Poletukhin, R., Schuster, C. M., & Groh, G. (2025). A Comprehensive Evaluation of Cognitive Biases in LLMs. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities, pp. 578–613. ACL. DOI: 10.18653/v1/2025.nlp4dh-1.50 | arXiv: 2410.15413
- Wang, K. et al. (2025). When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models. arXiv:2508.02087. Accepted at AAAI 2026. arxiv.org/abs/2508.02087
- Shojaee, M. et al. (Apple Machine Learning Research, 2025). The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. machinelearning.apple.com. Note: A subsequent critique disputed some methodological choices; the core performance ceiling finding remains broadly accepted.
- Kosmyna, N., Hauptmann, E., Tuan, Y.T. et al. (2025). Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task. MIT Media Lab / arXiv:2506.08872. arxiv.org/abs/2506.08872. Preprint; not yet peer-reviewed as of early 2026.
- Ke, Y., Yang, R., Lie, S. A. et al. (2024). Mitigating Cognitive Biases in Clinical Decision-Making Through Multi-Agent Conversations Using Large Language Models: Simulation Study. Journal of Medical Internet Research, 26, e59439. DOI: 10.2196/59439