Taxonomy of LLM failures

· 3 min read
Share:

TL;DR

  • 4 types of LLM failure: interpretive ambiguity, pure calculation, conceptual error, external knowledge
  • Each has a different solution: v17b prompt, extended thinking, better model, web search
  • Signs: ambiguity has “always/given that”, calculation errors vary, conceptual errors show false confidence
  • Decision tree: identify the type first, then apply the right technique

The four types of failure

TypeExampleRoot cause
Interpretive ambiguityCoins: 0 vs 1/13Bias toward “standard”
Pure calculationComplex arithmeticCapacity limit
Conceptual errorConfusing marginal with independenceDoesn’t know it doesn’t know
External knowledgeData from specific papersDoesn’t have the information

Solution for each type

1. Interpretive ambiguity

  • v17b prompt
  • ✅ “Permission to discard”
  • ❌ Roleplay / buffer (don’t help)

2. Pure calculation

  • ✅ Models with extended thinking (Opus, o1)
  • ✅ Code tools
  • ❌ Elaborate prompts (get in the way)

3. Conceptual error

  • ❌ No prompt solves it
  • ⚠️ Specific hints can help
  • ✅ More capable model

4. External knowledge

  • ✅ Web search
  • ⚠️ Verify extracted data
  • ❌ Expecting it to “reason” the answer

How to identify the type

Signs of ambiguity:

  • The problem has a word like “always,” “given that,” “it’s known that”
  • There are multiple ways to model a condition

Signs of calculation:

  • The model starts well but gets lost in the numbers
  • Different attempts give different numerical results

Signs of conceptual error:

  • The model says “this is impossible” or “there’s a contradiction”
  • Confuses technical terms (marginal vs conditional, correlation vs causation)

Signs of external knowledge:

  • The model invents formulas or cites papers that don’t exist
  • Different models give completely different answers

Quick decision table

Does the problem have ambiguity?
  → Yes → v17b prompt
  → No ↓

Is it complex calculation?
  → Yes → Extended thinking / code
  → No ↓

Is the model saying something clearly wrong but with confidence?
  → Yes → Conceptual error. Specific hint or better model.
  → No ↓

Does it need data not in the prompt?
  → Yes → Web search + verification
  → No → Should work. If it fails, review prompt.

This taxonomy is part of my broader prompt engineering guide, which covers how to communicate effectively with LLMs. Understanding these failure types connects to the AI trends for 2026 and helps you navigate the evolving landscape of AI capabilities.

Found this useful? Share it

Share:

You might also like