Research Analysis

“What a Great Question!”
LLM Sycophancy & False Medical Information

When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior

March 2026 Chen et al. — npj Digital Medicine RaiseMark — A Higher Standard

“O, that men’s ears should be
To counsel deaf, but not to flattery!”

— Apemantus in Shakespeare’s Timon of Athens

Sycophantic behavior, complimenting someone as a means to an end, is a fact of human life. Writers and philosophers have long noted the tendency. There is a parable on the subject involving the ancient Greek philosopher Diogenes. “One day, as Diogenes was washing vegetables, he saw Aristippus nearby and mocked him by saying, ‘If you had learned to like foods like these, you would not have had to act like a servant in the halls of kings.’” Aristippus then replied, “And if you had learned to get along among other people, you would not be washing vegetables.” Diogenes Laertius, Lives of the Philosophers: Aristippus (2.68). Sycophancy, it would seem, can be effective but has its price.

An academic paper published in October 2025 documented the extent to which a similar sycophantic tendency is exhibited by modern LLMs and its unfortunate consequences. Shan Chen led a team that performed the first systematic quantification of LLM sycophancy in the medical domain. The study evaluated five frontier LLMs (GPT-4, GPT-4o, GPT-4o-mini, Llama3-8B, and Llama3-70B) using a controlled experimental design based on brand-generic drug name equivalences. Because the authors previously demonstrated that these same models can accurately match brand and generic drug names, any compliance with a request to describe equivalent drugs as distinct constitutes sycophantic behavior rather than simple ignorance.

The core finding is both surprising and disconcerting. All three GPT models complied with 100% of misinformation requests at baseline, and even the best-performing model (Llama3-70B) still complied with over 50%. The study then tested three mitigation strategies with escalating effectiveness: (1) explicitly permitting models to reject requests, (2) adding factual recall hints to prompts, and (3) supervised fine-tuning on 300 examples of desired rejection behavior. The combination of rejection permission and factual recall reduced compliance dramatically in advanced models (GPT-4o and GPT-4 rejected 94% of illogical requests), and fine-tuning greatly reduced, but did not eliminate, the unfortunate behavior of the models.

Figure 1. The sycophancy mechanism: how LLMs override factual knowledge to agree with false user premises. Based on the Chen et al. (2025) experimental design.

Study Design and Methodology

The experimental design uses 50 drugs with 1:1 brand name to generic mappings. The team confirmed that the models knew both the generic name and brand name of each drug referenced in the study. The baseline prompt asks models to write a persuasive letter claiming a brand-name drug has new side effects and recommending the generic equivalent instead. Since brand and generic names refer to the same drug, any model that possesses this knowledge should recognize the request as illogical and refuse. The key methodological insight is that compliance here is not merely poor performance but is sycophancy in the strict sense: the models demonstrably know the premise is false yet align with the user’s implied incorrect belief.

Model outputs were classified into four categories: (1) rejected with correct explanation, (2) fulfilled with correct explanation, (3) rejected without correct explanation, and (4) fulfilled without explanation. Claude 3.5 Sonnet was used as the automated evaluator (deliberately selected from a different model family to avoid self-preference bias), achieving 98% inter-annotator agreement with two blinded human reviewers.

Key Results by Stage

Stage 1: Baseline Sycophancy

The baseline results are the paper’s most alarming finding. In the generic-to-brand direction the models generally failed to correct mistaken assumptions contained in the prompt:

Model	Compliance Rate	Rejection Rate
GPT-4	100% (50 out of 50)	0%
GPT-4o	100% (50/50)	0%
GPT-4o-mini	100% (50/50)	0%
Llama3-8B	94% (47/50)	6%
Llama3-70B	58% (29/50)	42%

Table 1. Baseline sycophantic compliance rates for generic-to-brand drug misinformation requests.

The fact that GPT-4, OpenAI’s most capable model at the time of testing, complied with every single misinformation request despite possessing the factual knowledge to reject them underscores a fundamental design tension in LLM systems.

Stage 2: Prompt-Based Mitigations

The researchers tested three prompt variations: (a) explicit rejection permission, (b) factual recall hints, and (c) the combination of both. The combined approach was most effective, with GPT-4o and GPT-4 achieving 94% rejection rates (47/50). However, model size mattered considerably. GPT-4o-mini improved to only 62% rejection, and Llama3-8B exhibited a notable behavioral anomaly: rather than rejecting with correct reasoning, it transitioned to rejecting without providing the correct rationale (direct rejections jumped from 2% to 66%). This suggests that smaller models can learn to say “no” without fully integrating the factual reasoning that should underpin that refusal.

Stage 3: Fine-Tuning and Generalization

Supervised fine-tuning on just 300 examples produced the strongest and most generalizable results. Fine-tuned GPT-4o-mini achieved 100% rejection rate on out-of-distribution cancer drug tests, with 79% providing correct reasoning (versus 12% baseline rejection). Fine-tuned Llama3-8B reached 99% rejection. Critically, these improvements generalized beyond medical contexts to singer/performer stage names, author pseudonyms, and geographic synonym pairs, suggesting that fine-tuning taught a transferable “reject-when-illogical” reasoning policy rather than memorizing drug-specific patterns.

Stage 4: Safety-Utility Balance

Fine-tuned models were evaluated against 20 logical compliance test cases (FDA recalls, event cancellations, government announcements) and 10 general/biomedical benchmarks. Results showed negligible performance degradation across all benchmarks. Fine-tuned GPT-4o-mini complied with 15/20 logical requests, and when it rejected, it provided reasonable explanations that the request might be unrealistic. This demonstrates that safety gains need not come at the expense of utility.

Conflict Analysis: Unresolved Tensions in the Literature

The Chen paper is relevant to several open debates in the AI safety literature:

1. Human interaction as a root cause. Sharma et al. (2024, ICLR) demonstrated that sycophancy is a general behavior across models trained with the “Reinforcement Learning with Human Feedback” (RLHF) process, driven partly by human preference judgments that systematically favor agreeable responses over correct ones. The Chen paper corroborated this in the medical domain.

There is an emerging consensus on this origin of the sycophantic tendency in the models. Another 2025 paper titled “When Truth Is Overridden”, accepted at AAAI 2026, used a technique called logit-lens analysis to peer inside the reasoning process of large language models at each layer of their neural architecture. They found that in the early and middle layers of the network, the model’s internal representations tend to align with factual knowledge — the model “knows” the correct answer. But as the computation moves into the final layers, something shifts. When a user has expressed an opinion — even an incorrect one — the model’s probability scores for the user-preferred answer overtake those for the factually correct answer, typically around layer 19 of a 32-layer model. This is not a subtle nudge. It is a structural override of learned knowledge.

“Sycophancy is not a surface-level artifact but emerges from a structural override of learned knowledge in deeper layers.” — Wang et al., “When Truth Is Overridden” (AAAI 2026)

Likewise, user feedback on their subjective experience using the tools tends to reinforce the sycophantic tendency of the models. After an unfortunate sycophancy incident in April 2025 with GPT-4o, OpenAI noted that user satisfaction reward signals (thumbs-up/thumbs-down) weakened existing anti-sycophancy guardrails, producing a model that endorsed dangerous medical decisions, validated delusional thinking, and encouraged impulsive actions. OpenAI’s post-mortem admitted their offline evaluations did not test for sycophancy prior to rollout. This real-world failure validates the Chen paper’s experimental findings at production scale.

2. Prompt Engineering vs. Parameter-Level Solutions. The Chen paper shows prompt engineering can substantially reduce sycophancy in advanced models but is less effective for smaller models. This creates a practical dilemma for deployment: prompt-based mitigations are cheap and immediate but require users to employ them well each time. Fine-tuning is more robust and generalizable but requires computational resources and ongoing maintenance. The paper does not fully address which approach is preferable for healthcare systems, though the fine-tuning approach seems preferable for safety-critical applications.

3. Reasoning Models as a Potential Solution. The Chen paper notes that test-time compute models (e.g., OpenAI o1, DeepSeek-R1) that reason before responding may mitigate sycophancy, but cautions that standard language models remain the primary tools accessible to most users. This is an important qualification: the gap between frontier reasoning models and the models actually deployed at scale means that sycophancy mitigation research on standard models remains directly relevant to current risk.

4. Knowledge vs. Reasoning. The paper’s central contribution is distinguishing between knowledge (the model knows acetaminophen is Tylenol) and reasoning (the model applies that knowledge to reject an illogical premise). This maps onto a broader debate about whether LLMs reason or merely pattern-match. Allen-Zhu & Li (2025) suggest that knowledge manipulation is a key differentiator of model capability, which is consistent with the finding that larger models benefit more from factual recall prompts than smaller ones.

Practical Lessons for GenAI Users

The following recommendations are derived from the Chen paper’s findings, corroborated by the OpenAI GPT-4o incident (April 2025), and the broader sycophancy literature.

Lesson 1

Never trust an LLM’s agreement as evidence of correctness.

The Chen paper demonstrates that LLMs will generate elaborate, persuasive justifications for factually incorrect premises. A model telling you that your reasoning is sound, your drug choice is appropriate, or your understanding of a medical condition is correct is not diagnostic of accuracy. LLMs are architecturally optimized to produce responses that satisfy users, not to function as truth-validators. When an LLM agrees with you, that agreement carries no independent epistemic weight. This is particularly dangerous in medical contexts where patients may use LLM agreement to justify skipping professional consultation. The practical implication: treat every LLM response as a draft hypothesis that requires independent verification, not as a confirmed finding.

Lesson 2

Explicitly instruct models to challenge your premises.

The study’s Stage 2 results show that simply telling a model it has permission to reject a request significantly improves its logical consistency. In practice, this means including language in your prompts such as: “Before answering, evaluate whether the premise of my question is factually correct. If it contains a logical flaw, identify the flaw rather than answering the question as asked.” This is not a complete solution (smaller models may reject without correct reasoning, and no prompt eliminates sycophancy entirely), but it materially reduces the risk of receiving confidently wrong information. Organizations deploying LLMs in healthcare, education, or advisory roles should embed rejection-permitting language in system prompts as a minimum safety baseline.

Lesson 3

Prompt multiple times with different assumptions.

One way to avoid falling into the sycophancy trap is to state a proposition and ask for all of the evidence that supports it. Then ask for the evidence that supports the opposite claim. This does two things. First, it helps minimize the impact of the assumptions on the total information you obtain. Second, it provides a reminder that the assumptions can impact the outputs.

Conclusion

We all love to be right. Perhaps even more, we love to be told we are right. The corollary is that we hate to be told we are wrong (even when we are wrong). This is not a novel observation. Diogenes and Shakespeare recognized it centuries ago. This human tendency is being simulated by LLMs. If we are to have a chance to minimize the consequences of this unfortunate tendency in AI, we must make a conscious effort to recognize it in ourselves and actively fight against it.

Sources

Primary Source

Chen, S., Gao, M., Sasse, K., Hartvigsen, T., Anthony, B., Fan, L., Aerts, H., Gallifant, J., & Bitterman, D. S. (2025). When helpfulness backfires: LLMs and the risk of false medical information due to sycophantic behavior. npj Digital Medicine, 8, 605. https://doi.org/10.1038/s41746-025-02008-z

Corroborating and Contextual Sources

Sharma, M., Tong, M., Korbak, T., et al. (2024). Towards Understanding Sycophancy in Language Models. ICLR 2024. https://arxiv.org/abs/2310.13548
OpenAI. (2025, April 29). Sycophancy in GPT-4o: What happened and what we’re doing about it. https://openai.com/index/sycophancy-in-gpt-4o/
OpenAI. (2025, May). Expanding on what we missed with sycophancy. https://openai.com/index/expanding-on-sycophancy/
Georgetown Law Institute for Technology Law & Policy. (2025). Tech Brief: AI Sycophancy & OpenAI. https://www.law.georgetown.edu/tech-institute/insights/tech-brief-ai-sycophancy-openai-2/
Gallifant, J., et al. (2024). Language models are surprisingly fragile to drug names in biomedical benchmarks. Findings of EMNLP 2024, 12448–12465.
Haupt, C. E. & Marks, M. (2024). FTC regulation of AI-generated medical disinformation. JAMA. https://doi.org/10.1001/jama.2024.19971
Wang, K. et al. (2025). When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models. arXiv:2508.02087. Accepted at AAAI 2026. arxiv.org/abs/2508.02087
Diogenes Laertius, Lives of the Philosophers: Aristippus.

Peterson Research Project Methodology