Prompt Engineering  

Prompt Engineering: Actionable RAG - Evidence, Eligibility, and Citations (with a Real-World Deployment)

Introduction

Most Retrieval-Augmented Generation (RAG) systems fetch a pile of passages and hope the model “does the right thing.” That works for demos, not for production. Actionable RAG treats evidence as a first-class primitive with explicit eligibility rules, minimal-span citations, and tool paths that turn claims into outcomes. This article lays out a pragmatic pattern for evidence-centered RAG—and then grounds it with a real deployment in healthcare prior authorization.

Why naïve RAG fails in production

Dumping long documents into context increases latency, confuses the model, and inflates costs. Worse, it hides provenance: when an answer is wrong, you can’t tell which sentence misled it. Successful teams flip the perspective: retrieval isn’t about stuffing the window; it’s about proving a claim with the smallest necessary spans and disclosing when proof is insufficient.

Architecture that makes evidence actionable

An actionable RAG stack has four contracts:

  1. Claim contract — the structured thing the system must decide (fields, types, allowed values).

  2. Retrieval policy — what sources are eligible (certified, recency-bounded, viewer-authorized), how many spans to return, and how conflicts are resolved.

  3. Citation rules — minimal spans (doc ID + line/paragraph ranges) that must back each claim field.

  4. Tool policy — what actions are permitted after a claim is proven (create ticket, schedule job, post transaction) and which receipts must be returned.

Everything else—embeddings, re-rankers, chunkers—is implementation detail.

Retrieval policy, stated, not implied

Write policy like code, not vibes:

  • Eligibility: include only tag:certified, lang:en, and age ≤ 180 days.

  • Conflict resolution: prefer newer > older; certified > draft; structured table > narrative text.

  • Top-K: fetch K=8 passages, then apply a diversity penalty to avoid near-duplicates.

  • Coverage test: require at least one span per required field; if missing, ask one clarifying question or decline.

This policy travels with the prompt and is versioned and diffed in CI.

Minimal-span citations

Return the shortest span that justifies each field—two to four lines, not a whole page. Store (doc_id, start_line, end_line) per field in the output. During reviews and audits, you can open the exact sentence that drove a decision. Users build trust when they see why the system said “yes/no.”

Tools after proof, with receipts

RAG without tools is still just text. When claims pass validation, call typed tools with idempotency keys and insist on receipts (e.g., case ID, job run ID). The model must not assert success without one. This turns “You are eligible” into “Case PA-93271 opened for MRI; appointment SCH-441 scheduled.”

Observability and governance

Log a context fingerprint (doc IDs + line ranges), the retrieval policy version, the prompt bundle, tool calls with receipts, and the final artifact. Create golden traces that must pass for any change in prompt, retriever, or tool wiring. Set guardrails: decline if evidence is stale, mask PII in free text, and switch to qualitative summaries above a sensitivity ceiling.

Economics you can steer

Costs fall when you reduce irrelevant context and retries. Practical levers: compress system text; keep K small but diverse; use a re-ranker to prioritize “decision-bearing” sentences; cache by (normalized question, policy version, context fingerprint); route easy cases to a small model and escalate only on uncertainty. Measure $/accepted outcome, not $/token.


Real-World Use Case: Prior Authorization Assistant at a Regional Health Network

Problem. Nurses and care coordinators spent ~14 minutes per case verifying whether a procedure (e.g., lumbar MRI) needed prior authorization under varying payer plans. Policy PDFs were long, outdated, and inconsistent across portals. Decisions were error-prone, and audits were painful.

Approach. The team shipped an Actionable RAG assistant embedded in the EHR sidebar:

  • Claim contract:

    {
      "member_id": "string",
      "procedure_cpt": "string",
      "requires_prior_auth": true,
      "policy_ref": "doc-id#line-start-end",
      "next_action": "create_case|schedule|no_action",
      "notes": "string"
    }
    
  • Retrieval policy v3: only payer-docs tagged certified:true, recency ≤ 180 days, member’s payer only; fetch K=8 passages; diversity penalty 0.6; require at least one span proving (a) procedure coverage and (b) prior-auth clause; conflict resolution newer>older, payer>third-party summaries.

  • Citation rules: minimal spans for both coverage and prior-auth clause; block output if any required span missing.

  • Tool policy: if requires_prior_auth:true, call CreatePriorAuthCase with idempotency key; if facility scheduling allowed, call ScheduleAppt only after case creation returns a receipt.

Workflow. A coordinator selects a patient and CPT code. The assistant retrieves certified policy spans, produces a structured decision with two citations, and—when appropriate—opens a prior-auth case and schedules the appointment. The UI shows the decision, the two highlighted policy sentences, and the case ID link.

Controls and safety.

  • If no certified doc ≤180 days exists, the assistant declines and prompts the coordinator to request an updated policy link.

  • All PHI is masked in free-text notes; only IDs appear in logs.

  • Golden traces include edge cases (Medicare Advantage carve-outs, pediatric exceptions, after-hours imaging).

  • Every deployment change rides behind a feature flag with a canary at one clinic before regional rollout.

Results (first 8 weeks).

  • Median decision time: 14:12 → 3:05 (−78%).

  • Audit exceptions: −61%, thanks to minimal-span citations.

  • Coordinator satisfaction: +22 NPS points.

  • $ per accepted decision: −34%, driven by smaller contexts, higher cache hits, and fewer human escalations.

  • Incidents: one stale payer PDF caused bad declines for 90 minutes; rollback targeted the retrieval corpus commit (no prompt/model change), validated via trace replays.

Lessons learned.

  • The biggest win wasn’t clever prompting; it was policy governance plus span-level citations.

  • Diversity-penalized retrieval prevented the model from seeing eight copies of the same paragraph.

  • Tool receipts turned “probably submitted” into auditable actions.


Conclusion

Actionable RAG reframes retrieval from “add more context” to “prove the claim.” With explicit eligibility rules, minimal-span citations, and tools gated by evidence, generative systems become fast, affordable, and audit-ready. The healthcare deployment above shows the pattern scales beyond demos: fewer errors, faster cycles, happier users, and clean audits—because every answer carried its proof and every action carried its receipt.