Verification-Gated Agentic AI System
An agentic system where an AI proposes, a verifier and a human decide. It answers messy B2B technical inquiries from cited, versioned facts, blocks anything ungrounded, and escalates when unsure. Every run is a replayable audit trail.
The challenge
The knowledge that answers these inquiries lives in the heads of senior engineers, not in any system, and naive retrieval over the source documents answers most of them confidently wrong. In a high-stakes domain the only real failure is being confidently wrong, so the product had to promise traceability and calibrated abstention, not raw automation: escalate when unsure rather than guess.
The solution
Built an agent-proposes, verifier-decides pipeline. An orchestrator retrieves cited claims and calls typed tools to emit a structured proposal; a separate verifier returns a structured verdict, with the factual signals (do the citations resolve, was a value computed without support) recomputed in code and forced over the model's opinion. A confidence gate routes each inquiry to answer, redirect, or escalate on three separate confidences. Knowledge lives in a versioned, cited claim store, never baked into model weights, where a correction forks a new claim version. Every step is written to an append-only event log that can reconstruct any inquiry with zero model calls.
Results
- Deterministic verifier: citation and computed-value checks decided in code, not delegated to the model
- Confidence gate routing answer, redirect, or escalate-to-human on calibrated abstention
- Versioned cited-claim knowledge store, no fine-tuning, corrections fork new claim versions
- Append-only event log that replays any decision with zero model calls
- Human-in-the-loop ingestion with durable suspend and resume
- Provider-swappable (Claude via gateway, Cohere embeddings) with a documented fully-offline mode
- Eval harness scoring correct-deflection and escalation precision
Want similar results?
Let's talk about your project and how I can help.