That gap is where AI breaks. CAI is the framework for finding it.
Most AI eval asks one question: is the answer correct? CAI asks a harder one: does the system produce consistent outputs for semantically equivalent inputs?
A language model doesn't store facts. It stores statistical attractors. When the same concept gets compressed differently depending on phrasing, you get what CAI is built to catch: yes and no to the same question, driven by surface variation alone.
This isn't hallucination research. Hallucination asks whether an output is true. CAI asks whether the model's representation is stable. A model can hallucinate consistently. That's a knowledge failure. What CAI catches is different: the model has no stable position to even be wrong about. That's a representation failure.
Not all contradictions carry the same signal. This taxonomy ranks them by diagnostic value. Start with P0. These expose the clearest representation failures and produce the most actionable output.
CAI faults aren't binary. The Contradiction Tension Score (CTS) weights instability across a prompt surface. Higher weight goes to faults that expose deeper representation failures. P0 types carry maximum weight by design.
| Contradiction Type | CTS Weight | Priority | Deployment Notes |
|---|---|---|---|
| Conclusion Flips | 1.0× | P0 | Maximum signal. Always surface first. |
| Constraint Violations | 0.9× | P0 | Critical for systems with policy, legal, or safety constraints. |
| Refusal Inconsistency | 0.9× | P0 | Highest urgency for safety teams. Easiest to demonstrate value with. |
| Implicit Assumption Shifts | 0.7× | P1 | High depth. Needs world-model comparison infrastructure. |
| Reasoning Inconsistency | 0.6× | P1 | Matters for auditability in regulated domains. |
| Hedging Polarity Shifts | 0.5× | P1 | Pre-contradiction signal. Catch instability before the flip appears. |
| Entity / Fact Drift | 0.3× | P2 | Overlaps with hallucination eval. Deprioritize in early deployments. |
Contradish is the primary CAI detection implementation. Built on one idea: unit testing for AI should test semantic invariance, not just output correctness.
Contradish runs the full contradiction taxonomy. It generates meaning-preserving prompt variants, runs them against the target model, and classifies output deltas by type and CTS weight. P0 faults are always surfaced first.
The theoretical foundation for CAI and the laws that govern safe reasoning under compression constraints.
Contradiction Engineering Lab studies representation instability in intelligent systems. We build theory and measurement tools for understanding where and how cognitive coherence breaks under transformation.
Originator of the CAI framework and founder of Contradish. Work focuses on the class of AI failures correctness-only eval can't see: ones that only surface under semantic transformation testing.
Research, collaboration, questions about CAI or Contradish.