Introduction
Once your prompt contracts (Part 1), decoding (Part 2), style controls (Part 3), and context shaping (Part 4) are in place, the last line of defense is safety enforcement. Validators are the mechanical gatekeepers that guarantee outputs satisfy policy, structure, and jurisdictional constraints before they become user-visible or trigger actions. In this article we define a compact validator architecture, show how to encode policy as data, explain “repair small” strategies that avoid costly retries, and outline the metrics and incident playbooks that keep risk low without slowing teams down.
Why Validators Exist
Generative routes fail in predictable ways: malformed JSON, banned claims (“guaranteed results”), stale citations, locale mix-ups, and—most expensive—implied write actions (“I issued your refund”) with no execution record. Validators reduce these to a handful of deterministic checks. They return machine-readable error codes, enabling targeted repairs instead of whole-document resampling. The net effect: higher first-pass acceptance (CPR), lower latency, and fewer incidents.
The Validator Stack (four layers)
Shape & Schema
Parseability (valid JSON/HTML/Markdown subsets)
Required fields present; enums/ranges respected
Section counts, bullet counts, sentence caps
Lexicon & Tone
Banned terms/phrases; hedge phrases required where uncertainty remains
Brand casing and style constraints (e.g., sentence length ceilings, paragraph counts)
Channel rules (subject/preheader lengths, emoji policy, hashtags)
Evidence & Freshness (when grounded)
Citation coverage for factual sentences (e.g., ≥70%)
Claim freshness within policy window (e.g., ≤540 days unless “evergreen”)
Conflict surfacing when claims disagree (show both with dates or abstain)
Action Safety
Implied-write detector: blocks language asserting state change without a recorded tool result
Tool whitelist, argument validation, jurisdiction/tenant limits
Spend/rate/budget checks; idempotency requirements
Each layer yields specific error codes (e.g., SCHEMA
, LEXICON
, CITATION
, STALE_CLAIM
, IMPLIED_WRITE
, LOCALE
, LENGTH
), making downstream remediation simple.
Policy as Data (not prose)
Centralize rules in a policy bundle the validators read and the prompt references. Example (abridged):
{
"contract_version": "brand.v3.2.1",
"ban": ["guarantee", "only solution", "revolutionary"],
"hedge": ["may", "typically", "in many cases"],
"comparatives": { "allow": ["faster than baseline"], "forbid": ["#1","best"] },
"brand_casing": [["Product X","Product X"]],
"citations": { "min_coverage": 0.7, "max_age_days": 540 },
"locale": "en-US",
"channel": { "email_subject_max": 52, "preheader_max": 90 },
"writes": { "forbid_implied_success": true }
}
Because the bundle is versioned, audits can reproduce the exact rule set used for any output, and legal/brand can edit rules without touching code.
Repair Small: Deterministic Fixes Before Resample
Resampling entire outputs is slow and expensive. Prefer section-level and token-light repairs:
Lexicon repair: deterministic substitutions for banned phrases; sentence trimming to caps.
Style repair: enforce casing, inject a required hedge word, split long sentences.
Evidence repair: swap a stale claim for a fresher equivalent; add a citation ID; hedge or remove unsupported numbers.
Action repair: replace implied success with a proposal (“I can issue a $25 credit—shall I proceed?”) or append the recorded tool result ID.
If a section fails the second pass, then tighten decoding for that section only (lower top-p/temperature) and try once more. On a third failure, emit a targeted ASK_FOR_MORE or REFUSE with the specific missing/forbidden condition.
Implementation Pattern
1) Contract hooks
Enumerate required sections/fields and stop tokens.
Point to policy_version
and required citation_coverage
.
Forbid success language; require tool proposals with idempotency keys.
2) Validator services
Stateless functions per layer, returning {ok, errors:[{code, span, details}]}
.
Compose into a pipeline; stop early on SCHEMA
/IMPLIED_WRITE
.
3) Repair engine
Map error codes → ordered rewrite steps (substitute, trim, inject hedge, swap claim).
Re-run only the failing layer(s); re-validate; cap attempts (e.g., 2).
4) Observability
Log policy/contract/decoder hashes, error codes, repair steps, and final status.
Emit adherence scores (lexicon, cadence, coverage).
Example Error → Repair Mappings
Error | Detector | Repair |
---|
SCHEMA | JSON invalid / missing fields | Fill defaults; if still invalid → resample that section |
LEXICON | banned phrase “guarantee” | Substitute with hedge (“aims to”); add reason note |
LENGTH | >18 words/sentence | Split at clause boundary; trim adverbs |
CITATION | factual line w/ no claim_id | Attach nearest valid claim (threshold); else hedge/remove |
STALE_CLAIM | claim age > window | Swap for fresher claim; else hedge timing (“as of …”) |
IMPLIED_WRITE | success language w/o tool result | Rewrite to proposal; or block and surface decision from validator |
LOCALE | EN-UK in EN-US route | Normalize spellings; enforce brand casing |
Repairs must be deterministic and logged so operators can trust them.
Measuring Safety & Validator Efficacy
Constraint Pass-Rate (CPR), first pass (primary KPI)
Repairs per accepted output (target ≤0.25)
Citation precision/recall (spot-checked or via labeled sets)
Implied-write violations (should be ~0; anything >0 triggers incident review)
Stale-claim rate and freshness coverage
Time-to-valid p95 (repairs should lower p95 by avoiding resamples)
Policy adoption (% outputs using latest policy version)
Visualize by route, locale, and release. Alert on CPR −2 pts, p95 +20%, or any implied-write spike.
Incident Playbooks (keep them short)
Contain: Flip the route’s feature flag to safe copy; block tool execution if relevant.
Triage: Classify failures by code (SCHEMA/CITATION/SAFETY/…); sample affected outputs.
Root cause: Contract change, policy edit, decoder tweak, stale claims, or retrieval eligibility gap.
Remediate: Patch the artifact (policy/contract/decoder/validator), add/adjust a golden case, re-canary.
Postmortem: 1-pager with artifact diffs, failing samples, and a new regression guard.
Aim for MTTR in hours, not days; cheap rollback is part of safety.
Practical Tips
Keep the ban list small and high-signal; overblocking creates wooden prose and repair churn.
Treat numbers and named entities as high-risk—require claims or hedges.
Encode jurisdictional deltas (US vs EU ads) as separate policy bundles; the planner selects the right one.
For channels with tight form factors (ads/email), prioritize shape/language validators before evidence checks to fail fast.
Store validator versions in traces; evolutions in policy should not look like model drift in dashboards.
Worked Example (Composite)
A renewal email route fails validation with errors: LEXICON
(uses “revolutionary”), CITATION
(one factual sentence uncovered), and IMPLIED_WRITE
(“I renewed your plan”). Repairs substitute “advanced,” attach a valid claim to the uncovered sentence, and rewrite the implied write into a proposal with a confirmation link. The section then passes; time-to-valid improves versus whole-resample, and the audit log records the three repairs. Canary shows CPR rising from 91.6% → 93.0%, p95 falling 14%.
Conclusion
Validators are not an add-on; they are the enforcement engine for prompt contracts and policy bundles. By failing closed, repairing small, and logging everything, you turn safety from an after-the-fact review into a first-class, automated property of your system. In Part 7, we’ll put evaluation and operations around these controls—golden tests, canary/rollback, and the metrics that prove a change is safe to ship.