Generative AI  

Generative AI, Part 4 — Modular RAG for Creators: Blend Templates with Evidence (Without Breaking Flow)

Introduction

Pure generation feels fast but risks drift; pure retrieval feels safe but clunky. Modular RAG for creators marries both: you keep the programmatic creativity stack from Part 3 (templates, style frames, validators), and plug in a small, policy-aware evidence layer that adds quotes, numbers, and references—without derailing voice, rhythm, or deadlines.


The Goal (in one sentence)

Generate on-brand content that cites minimal spans from eligible sources, asks for more when evidence is thin, and never fabricates specifics.


Architecture at a Glance

Inputs → Planner → [Template + Style Frame] → Retrieval (Policy-Aware) → Claim Shaper
          → Sectioned Generator → Citation Stitcher → Validators/Repairs → Selector/Ranker → Output

Key modules

  • Retrieval (policy-aware): filters by tenant, region, license, recency before similarity.

  • Claim shaper: turns passages into atomic claims with source_id, effective_date, and minimal-span quote.

  • Citation stitcher: maps generated sentences to supporting claims (1–2 per fact).

  • Validators: enforce schema, citation coverage, claim freshness, and banned terms.


Evidence, Then Words: The Minimal Data Model

{
  "claim_id":"kb:2025-06-12:capex-survey#q17",
  "text":"63% of respondents plan to reduce compute cost by consolidating workloads.",
  "source_id":"report:capex-2025",
  "effective_date":"2025-06-12",
  "tier":"primary",
  "jurisdiction":"US",
  "span":"…63% of respondents plan to reduce compute cost by consolidating workloads…",
  "url":"https://example.com/report"
}

Why atomic claims? They compress context safely, survive template changes, and make citations deterministic. A “claim pack” is just 6–20 of these objects, sorted by rank and freshness.


Policy-Aware Retrieval (Eligibility > Similarity)

Before vector search or BM25, enforce:

  • Jurisdiction/region (EN-UK vs EN-US versions)

  • License/allow-list (only approved sources)

  • Freshness windows (e.g., < 24 months unless evergreen)

  • Source tiering (primary > secondary > tertiary)

Return a small set, then shape into claims. Relevance alone is not enough for public content.


Sectioned Generation That Uses Claims Naturally

Generate per section, feeding only relevant claim IDs + minimal spans:

[FORMAT] Overview (2 sentences) • What Changed (3 bullets) • Why It Matters (3 bullets) • CTA (1 sentence)
[STYLE] Trusted advisor; sentences ≤ 20 words; no hype words.
[EVIDENCE] Use claims by ID only; quote minimal spans when stating numbers.
[CITATION POLICY] Each factual sentence → 1–2 claim_ids. If coverage <70% or claims conflict, ask for more and stop.

Why sectioned? Better control, fewer token spikes, easier to track citation coverage per section.


The Citation Stitcher (Lightweight, Deterministic)

After generation:

  1. Detect factual sentences (numbers, named entities, superlatives).

  2. Map each to nearest claim by lexical/semantic overlap (within the section’s evidence set).

  3. Attach claim_ids inline or in a citations[] field.

  4. If unmapped and sentence is factual → repair loop: either add a hedge (“according to [policy]”) with a valid claim or drop the sentence.

Rule of thumb: Favor one minimal quote over a paragraph cite; your validator will thank you.


Validators (What to Enforce Before You Ship)

  • Schema: required sections present; bullets counts; sentence caps.

  • Citation coverage: ≥ 70% of factual sentences have 1–2 claim_ids.

  • Freshness: no claim older than policy window unless marked evergreen.

  • Conflict surfacing: if two claims disagree, require both be shown with dates, or abstain.

  • Banned terms/claims: “only,” “guarantee,” or jurisdictionally risky phrases.

  • Brand casing & locale: “Organisation” vs “Organization,” product names.

If any fail → repair or resample the affected section only.


Prompt Scaffolds (Copy/Paste)

Planner → Outline

Write an outline for: {template.id}. For each section, list 1-line intents.
Use available claims: {claim_ids}. Do not invent facts.
If claims conflict, note "CONFLICT: {claim_id_a} vs {claim_id_b}".

Section Generator

You are writing the {section.key} in our brand voice.
Use only these claims: {claims_for_section}.
Rules:
- Each factual sentence must reference 1–2 claim_ids in [brackets].
- Quote minimal spans when stating numbers.
- If claims are insufficient, output "ASK_FOR_MORE:{missing_fields}" and stop.
Format: plain text for the section only.

Repair

The previous section failed validation: {issues}.
Revise using the same claims and constraints. Remove unsupported factual sentences.

Worked Example: Launch Blog with Two Evidence Blocks

Inputs

  • Template: “Blog v2.0” (Overview, What/Why, Proof, CTA)

  • Claims: 5 from a customer study, 4 from a public benchmark

  • Style: “Trusted Advisor”

Flow

  1. Planner chooses “Blog v2.0” + style frame.

  2. Retrieval filters to EN-US, policy-allow list, last 18 months.

  3. Claim shaper emits 9 atomic claims.

  4. Generator creates sections; “Proof” cites 3 claims; “Why It Matters” cites 2.

  5. Stitcher maps stray numeric to correct claim_id.

  6. Validators flag 1 old claim; repair swaps to fresher alternative.

  7. Selector picks Variant B (higher pass-rate + readability).

Outcome: On-brand article with five minimal-span citations, no hype words, and ready links for legal review.


Measuring Quality (No Human Labels Required)

  • Citation pass-rate: factual sentences with valid claim_ids on first pass.

  • Freshness score: % of claims within policy window.

  • Coverage: avg claims used per section vs. available.

  • Time-to-valid: generation → validation → (repairs).

  • Drift watch: conflicts per 100 articles; abstention frequency.

Use a golden pack of 30–50 topics; block releases if pass-rate drops or time-to-valid rises.


Cost & Latency Tips

  • Keep claim packs small (6–20 claims); section them upfront.

  • Cache shaped claims and planner outputs for related variants.

  • Use stop sequences between sections; cap tokens per section.

  • Apply speculative decoding for long sections; verify with validator.

  • Track $ per accepted article; optimize the slowest stage first (often retrieval shaping or repair).


Anti-Patterns (and Better Patterns)

  • Dumping full pages into context → Shape into atomic claims with minimal spans.

  • Citing home pages → Cite the exact sentence you quoted; attach effective_date.

  • One-shot long-form → Sectioned generation + stitched citations.

  • All or nothing cities → It’s fine to hedge non-factual lines; only facts need claims.

  • Ignoring conflicts → Surface both claims with dates; let readers judge.


Minimal Config Objects

Claim pack header

{"task":"blog","jurisdiction":"US","freshness_days":540,"tie_break":["rank","effective_date(desc)","tier"]}

Validator policy

{"min_citation_coverage":0.7,"max_claim_age_days":720,"ban":["guarantee","only"],"conflict_rule":"surface_both"}

Decoder policy

{"top_p":0.9,"temperature":0.7,"repetition_penalty":1.05,"section_max_tokens":{"overview":120,"proof":220}}

Store these with the output for reproducibility.


Conclusion

Modular RAG for creators adds just enough evidence to keep content true, auditable, and safe—without flattening voice or slowing teams down. Filter for eligibility, shape into atomic claims, generate by section, stitch minimal-span citations, and validate hard. With a few small artifacts—claim packs, validator policies, and sectioned prompts—you turn creative pipelines into governed, repeatable production.