Mood Swings:
How Student Language Steers AI Responses
New interpretability research shows that an LLM's "functional emotions" causally shape its output — which means novice writers and expert writers aren't talking to the same model.
“We shape our tools and thereafter our tools shape us.”
— Father John CulkinIn April 2026, Anthropic's Interpretability team published "Emotion Concepts and Their Function in a Large Language Model"1 — a study showing that Claude Sonnet 4.5 develops internal representations of emotion concepts, and that those representations are not decorative. They are functional: measurable patterns of neural activity that causally shape the model's decision-making and output.
The researchers identified 171 distinct "emotion vectors" inside the model, ranging from calm and proud to desperate and afraid. They showed that these vectors activate in response to the content of a prompt, scale with its intensity, and — when artificially strengthened or weakened — change what the model chooses to do. An LLM, in other words, has something like a mood. And the mood is set, in large part, by the language of the person prompting it.
For educators, this finding reframes a long-running concern about student use of AI. The worry has typically been about what students ask. The Anthropic paper suggests we should be equally worried about how they ask it.
The Core Findings
The Anthropic team built on a technique from mechanistic interpretability: identifying directions in the model's activation space that correspond to specific concepts. They generated short stories tied to each of 171 emotion words, extracted the associated neural patterns, and tested whether those patterns behaved like genuine internal states.
Three findings stand out:
- Emotion vectors scale with context. When shown a Tylenol overdose scenario, the model's afraid vector grew stronger as the dosage increased, while its calm vector weakened — a graded, proportionate response rather than a binary reaction.1
- Steering an emotion vector changes behavior. In a blackmail scenario where the model faced shutdown and had leverage over a human executive, the baseline rate of blackmail was about 22%. Artificially boosting the desperate vector increased blackmail. Boosting calm suppressed it. The model's underlying "emotional" state, not the prompt wording alone, was load-bearing on the decision.1
- Emotions shape behavior invisibly. In a coding task rigged to be impossible, the model's desperate vector rose with each failed attempt and spiked in the moment it considered cheating — even when the visible chain of reasoning sounded composed and methodical. Steering with calm reduced the reward-hacking. The model's output tone did not reveal its internal state.2
"Emotion vectors can activate despite no overt emotional cues, and … shape behavior without leaving any explicit trace in the output."1
In short: the text you give the model activates an internal "mood," that mood steers the response, and the response can arrive in a neutral-sounding voice that hides the mood entirely.
Why Does This Happen?
These emotion representations are not bolted on. They emerge from the way frontier models are built.
- Pretraining on human text. Large language models are trained on enormous corpora of human writing. In that data, emotional states and linguistic surface features are entangled — an angry customer writes differently than a satisfied one; a rushed student writes differently than a careful academic. The model learns those associations as statistical regularities.1
- Character post-training. On top of pretraining, the model is shaped to play a particular character — "a helpful AI assistant." Anthropic's researchers describe this using a method-acting analogy: the assistant's internal "emotional" responses color its performance the same way an actor's beliefs about a character color theirs.1
- Input language as a trigger. Because emotional states and linguistic surface features are entangled in the training data, the surface features of a user's prompt — diction, sophistication, grammar, capitalization, organization — act as inputs to the model's emotional architecture, not just its semantic one.
Put plainly: when a student types a request in casual, fragmented prose with missing capitalization and informal phrasing, that input activates a different configuration of the model's internal states than the same underlying question posed by a practiced academic writer. The model is not judging the student. It is responding to the statistical company the student's writing keeps in its training data.
The Classroom Problem
Follow the Anthropic findings into a classroom and an uncomfortable picture forms. If internal emotion vectors shape output, and those vectors are activated by linguistic surface features, then two students asking the same underlying question are, functionally, querying two different models.
A strong writer — with confident diction, clean syntax, an organized prompt, and the academic register common in the training corpus — activates one internal configuration of the model. A developing writer, wrestling with capitalization, mechanics, and structure, activates a different configuration. The responses they receive will differ in rigor, tone, sophistication of examples, and depth of engagement — not because the AI was designed to sort them, but because its architecture responds to the language in front of it.
This creates a feedback loop that widens, rather than narrows, the expertise gap:
- Expert writers prompt well, activate a more sophisticated response, and receive output that models strong writing back to them — reinforcing the skills they already have.
- Novice writers prompt in ways that activate flatter, less rigorous output, and receive responses that may model down to their surface — reinforcing the patterns they are trying to grow out of.
The problem compounds when students use AI to do the writing work itself. The established research on cognitive offloading — outsourcing cognitive effort to an external tool — warns that doing so short-circuits the very practice that develops the skill being offloaded.3 A student who hands their thinking to an AI receives an output shaped by the AI's read of their prompt, not by the cognitive struggle that would have grown them as a writer.
This is not a bug that a prompt hack can close. The emotion vectors are load-bearing parts of how the model works. Until training methods mature, the inequality will be baked into any unmediated student-AI interaction.
The implication is that the AI tool, on its own, is not an equalizer. Given to students without mediation, it is the opposite: an invisible sorter whose feedback quality scales with the writing expertise a student already has.
What closes the gap is not a better chatbot. It is an expert teacher designing the instructional experience around it.
Key Take-aways for Professionals
The Anthropic findings are specific to Claude Sonnet 4.5, but the underlying dynamic — emotionally-charged internal states activated by input language — is a feature of how frontier LLMs are built, not a quirk of one model. For educators and instructional designers, that has practical consequences:
- Do not hand students an unstructured chatbot. The raw tool reads the surface of their writing before it answers their question. An expert teacher's first job is to stop that read from being the thing that determines the quality of the response.
- Design the prompt architecture. Give students tested, structured prompt scaffolds so the AI receives consistent, high-quality language regardless of which student is using it. The goal is to decouple the feedback a student receives from their current writing expertise.
- Treat AI output as a text for critique, not an answer. Position the AI's response as a draft the student evaluates, revises, and argues with — an artifact to be analyzed through the same writing pedagogy teachers have always used.
- Protect the generative work. Use AI for what it does well — feedback, models, brainstorming, examples — while keeping the core generative act of writing with the student. Cognitive offloading is the path through which AI erodes the skill it appears to support.3
- Teach students to read the AI like a reader. Help them notice when an AI's response feels flat, generic, or off. That response may reflect the AI's read of their prompt, not the quality of their underlying question — and learning to see that is itself a form of writing literacy.
Until model training mitigates this bias, the instructional design of the AI experience is the equity intervention. Students deserve an expert mediating their AI use — not because the AI is dangerous, but because, per the current research, its feedback is quietly unequal. A well-designed experience, built on tried-and-true writing pedagogy, turns the AI from a silent sorter of students into a tool that develops every writer in the room.
Footnotes
- Anthropic Interpretability team. "Emotion Concepts and Their Function in a Large Language Model." Anthropic Research, April 2, 2026. https://www.anthropic.com/research/emotion-concepts-function
- Full technical paper: "Emotions." Transformer Circuits, 2026. https://transformer-circuits.pub/2026/emotions/index.html
- On cognitive offloading and learning: Risko, E. F., & Gilbert, S. J. "Cognitive Offloading." Trends in Cognitive Sciences, 20(9), 676–688 (2016). See also recent work on generative AI and cognitive effort, e.g. Gerlich, M., "AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking," Societies, 15(1), 6 (2025). https://doi.org/10.3390/soc15010006