It got to 0 and called it a contradiction

· 3 min read
Share:

TL;DR

  • “Two-Box” system: separate contexts so the model reviews without bias
  • Problem: it reached the correct answer (0) and called it a “contradiction”
  • Separating contexts isn’t enough: the model won’t accept counterintuitive results
  • Solution: combine Two-Box with “permission to accept the unexpected”

The experiment

I designed a “two-box” system to verify LLM responses:

BOX 1: Generate response → "1/13"
       (context gets discarded)

BOX 2: [Only sees the problem + proposed answer]
       "Verify from scratch if 1/13 is correct"

The idea: if the model doesn’t see its own reasoning, it can evaluate the answer without bias.

What happened

The model in Box 2:

  1. ✅ Identified that the standard interpretation was incorrect
  2. ✅ Set up the correct equations for dependent coins
  3. Calculated p₀ = 0
  4. ❌ Wrote: “I find a contradiction…”
  5. ❌ Final answer: 1/13

It reached the correct answer and rejected it.

Why this happens

Separating contexts solves the “tokens conditioned on previous answer” problem. But there’s another issue: the model won’t accept counterintuitive results.

For the model, “probability = 0” feels like an error. It’s seen thousands of problems where the answer is a nice fraction. So it rationalizes: “there must be a contradiction in my setup.”

The fix

Two-Box needs to be combined with “permission to discard”:

IMPORTANT: If your calculation reaches a result that seems
counterintuitive (like probability = 0), THAT is the answer.
Don't call it a "contradiction." Accept it if the math says so.

Conclusion

The self-correction problem in LLMs has two layers:

  1. Architectural: Review tokens are conditioned on context (Two-Box solves this)
  2. Confidence: The model won’t accept the counterintuitive (requires explicit permission)

This post continues from The model knows how to reason. It just won’t commit, where I documented the 17 iterations that led to the initial discovery.

In the next experiment, I tested whether more reasoning tokens helped. Spoiler: they didn’t.

This post is part of my series on the limits of prompting. For a complete view, read my prompt engineering guide.


Keep exploring

Found this useful? Share it

Share:

You might also like