Prompt Engineering: Tool Mediation - Propose → Validate → Execute — Part 5

John Godel
Oct 16
392
0
3

Article

Introduction

Language models are persuasive. Production systems cannot let persuasion masquerade as action. Any route that changes state—issuing refunds, updating records, scheduling meetings—must separate what the model says from what the system does. This article lays out a practical tool-mediation pattern that keeps control in your backend: the model proposes an action, your services validate scope and preconditions, and only then do you execute with full audit. The result is simple to implement, resistant to “implied writes,” and compatible with every major LLM API.

The Problem Tool Mediation Must Solve

Without mediation, you’ll see recurring failures:

Implied success: “I’ve issued the refund” appears in text when nothing happened.
Parameter drift: free-text arguments (“tomorrow afternoon”) are ambiguous or wrong.
Privilege creep: a single API key lets an agent do everything, everywhere.
Non-idempotent retries: the same command fires twice after a network hiccup.
Opaque incidents: after a complaint, you can’t prove what was attempted and why.

All of these disappear when actions require a structured proposal, server-side validation, and recorded outcomes.

Design Goals

No side effects from prose. Text cannot change the world.
Typed, minimal interfaces. Tools expose small, well-typed argument schemas.
Least privilege & time-bound identity. Each route/agent gets only what it needs, for a limited time.
Idempotent by default. Every effect is safe to retry.
Tamper-evident audit. You can replay plan → decision → outcome.

The Mediation Pattern

1) Propose (model)

The contract instructs the model to emit structured proposals instead of narrative claims of success.

{
  "proposed_tool": {
    "name": "create_support_ticket",
    "args": {
      "customer_id": "C-10429",
      "subject": "Billing discrepancy",
      "priority": "high"
    },
    "preconditions": ["customer_verified", "open_invoice_present"],
    "idempotency_key": "2a1d-8f4c-…"
  }
}

Contract tips

Enumerate allowed tools per route.
Require preconditions as natural-language checklists (“customer_verified”).
Forbid success language (“ticket created”) in the same turn.

2) Validate (server)

Your middleware verifies before any tool executes:

Schema & types: required args present; enums valid.
Preconditions: verify checkboxes using your data plane (e.g., customer_verified flag).
Authorization: the route’s service account is allowed to call this tool for this tenant.
Idempotency: if idempotency_key was seen, return the prior result.
Policy gates: jurisdiction, spend limits, rate limits, risk flags.

Return a structured decision:

{
  "decision": "approved",
  "reason": null,
  "execution_plan": { "tool": "create_support_ticket", "args": {...} }
}

{
  "decision": "rejected",
  "reason": "Precondition 'open_invoice_present' failed",
  "remediation": "ASK_FOR_MORE: attach latest invoice ID"
}

3) Execute (service)

Only on approved decisions do you invoke the tool adapter. Capture the raw provider response, normalize it, and return a result object:

{
  "result": {
    "ticket_id": "T-77391",
    "url": "https://support.example.com/T-77391",
    "status": "created"
  }
}

Send this back to the model/UI to compose the final user-visible message (“I created ticket T-77391; here’s the link.”). The text mirrors the actual tool outcome, not speculation.

Tool Interface Shape

Adapter signature

type Tool<Args, Result> = (ctx: Context, args: Args, opts: { idempotencyKey: string }) 
  => Promise<Result>

Good practices

Keep Args small; prefer IDs over free-form strings.
Resolve natural language before proposal (e.g., “tomorrow 3pm PT” → ISO time) or reject with an ASK.
Return stable identifiers (ticket_id, meeting_id).
Include a dryRun mode for sandbox tests.

Identity, Permissions, and Limits

Per-route service accounts with scoped tokens (only the tools that route may call).
Time-boxed credentials (minutes to hours).
Tenant scoping on every adapter call.
Budget guards: max refunds/day, max invites/hour, etc.
Rate limiters per tool and per tenant.

If you can’t explain an adapter’s permission in one sentence, it’s too broad.

Idempotency & Retries

Every execution path must be safe to repeat:

Accept and store an idempotency key; return the original success on duplicate requests.
Design external calls to be idempotent where possible (e.g., PUT vs POST).
For non-idempotent providers, store a minimal dedupe fingerprint (args hash + tenant).

Observability & Audit

Store a complete, tamper-evident record:

{
  "trace_id": "x-9af3",
  "route": "billing_help",
  "contract_hash": "sha256:…",
  "proposal": { "name":"create_support_ticket","args":{…},"preconditions":[…],"idempotency_key":"…" },
  "decision": { "approved": true, "reason": null },
  "execution": { "tool":"create_support_ticket","args":{…},"started_at":"…","ended_at":"…","result":{"ticket_id":"T-77391"} },
  "actor": { "service_account":"route-billing-help@bots", "tenant":"acme" }
}

Hash-chain logs or write to an append-only store if you need stronger guarantees.

UX & Language Patterns

Use neutral future tense until execution completes: “I can create a ticket with these details—shall I proceed?”
After success, refer to identifiers: “Ticket T-77391 is created.”
On rejection, surface the precondition that failed and ask for missing inputs.
Never echo provider error messages verbatim; map to user-safe explanations and keep raw errors in logs.

Failure Taxonomy (make fixes actionable)

SCHEMA: missing/invalid args
PRECONDITION: check failed (verification, limits, policy)
AUTHZ: route lacks permission
IDEMPOTENCY: duplicate request handled
PROVIDER: downstream API failure (4xx/5xx/timeouts)
TIMEOUT: exceeded SLO; emit fallback copy and surface a status link

Your metrics and alerts should break down along this taxonomy.

Metrics That Matter

Proposal pass-rate (approved / proposed)
Implied-write violations (should be ~0; validator enforced)
Execution success rate by tool and tenant
p95 decision latency (proposal → decision) and p95 execution latency
Idempotency hit-rate (healthy duplicates vs. accidental repeats)
$ / successful action (LLM + mediation + execution)

Tie these to business KPIs (e.g., resolution time, CSAT, refund accuracy).

Implementation Checklist

Contract enumerates allowed tools and forbids success language
Proposal schema with preconditions and idempotency_key
Validation middleware (schema, preconditions, authz, policy, limits)
Tool adapters with least privilege and tenant scope
Idempotent execution path and dedupe storage
Validator that blocks implied writes in prose
Audit log (plan → decision → execution) with trace IDs
Canary flags and rollback for new tools or scopes

Worked Example (Composite)

A “refund assistant” route supports partial credits.

Proposal (model):

{"proposed_tool":{
  "name":"issue_credit",
  "args":{"order_id":"O-55291","amount":25,"currency":"USD"},
  "preconditions":["order_paid","within_return_window"],
  "idempotency_key":"O-55291-USD-25-2025-10-16"
}}

Validation (server):

order_paid ✅, within_return_window ✅, policy limit (≤$50) ✅, tenant scope ✅ → approved.

Execution (adapter):

POST /credits → credit_id=CR-11802.

User message (post-execution):

“Issued a $25 credit on order O-55291 (ref CR-11802). Funds post in 3–5 days.”

Audit: full trace saved; if user asks later, you can prove exactly what happened.

Common Pitfalls—and Fixes

“I did X” in text without a record. Block with an implied-write validator; require proposals.
Free-form args (e.g., “next Friday morning”). Normalize first or ASK for specifics.
Single god-token. Replace with per-route service accounts, scoped tokens, and expirations.
Non-idempotent providers. Add dedupe keys and compensating checks.
Silent retries. Surface idempotent replays in logs; alert if they spike.

Conclusion

Tool mediation converts persuasive language into safe action. By forcing the model to propose, giving your backend the right to decide, and executing with least privilege and idempotency, you eliminate implied writes, improve auditability, and keep incidents cheap. In Part 6, we’ll formalize validators and safety policy—how to fail closed, repair small, and keep brand/legal rules enforceable at scale.