It got to 0 and called it a contradiction
TL;DR
- “Two-Box” system: separate contexts so the model reviews without bias
- Problem: it reached the correct answer (0) and called it a “contradiction”
- Separating contexts isn’t enough: the model won’t accept counterintuitive results
- Solution: combine Two-Box with “permission to accept the unexpected”
The experiment
I designed a “two-box” system to verify LLM responses:
BOX 1: Generate response → "1/13"
(context gets discarded)
BOX 2: [Only sees the problem + proposed answer]
"Verify from scratch if 1/13 is correct"
The idea: if the model doesn’t see its own reasoning, it can evaluate the answer without bias.
What happened
The model in Box 2:
- ✅ Identified that the standard interpretation was incorrect
- ✅ Set up the correct equations for dependent coins
- ✅ Calculated p₀ = 0
- ❌ Wrote: “I find a contradiction…”
- ❌ Final answer: 1/13
It reached the correct answer and rejected it.
Why this happens
Separating contexts solves the “tokens conditioned on previous answer” problem. But there’s another issue: the model won’t accept counterintuitive results.
For the model, “probability = 0” feels like an error. It’s seen thousands of problems where the answer is a nice fraction. So it rationalizes: “there must be a contradiction in my setup.”
The fix
Two-Box needs to be combined with “permission to discard”:
IMPORTANT: If your calculation reaches a result that seems
counterintuitive (like probability = 0), THAT is the answer.
Don't call it a "contradiction." Accept it if the math says so.
Conclusion
The self-correction problem in LLMs has two layers:
- Architectural: Review tokens are conditioned on context (Two-Box solves this)
- Confidence: The model won’t accept the counterintuitive (requires explicit permission)
This post continues from The model knows how to reason. It just won’t commit, where I documented the 17 iterations that led to the initial discovery.
In the next experiment, I tested whether more reasoning tokens helped. Spoiler: they didn’t.
This post is part of my series on the limits of prompting. For a complete view, read my prompt engineering guide.
Keep exploring
- 50+ ChatGPT prompts that actually work - Practical prompts you can use today
- Best free AI tools in 2026 - Where to apply these techniques
- What are AI agents? - When prompts aren’t enough
You might also like
The model knows how to reason. It just won't commit
17 prompt iterations revealed that the model finds the correct answer but self-censors for not being standard
The prompt that solves ambiguous problems
Practical guide to prompt v17b: a methodology for LLMs to identify and discard incorrect interpretations
Prompt Engineering Guide: How to Talk to LLMs
Everything you need to know to write effective prompts. From beginner to advanced, with practical examples and the limits nobody tells you about.