LLMs  

LLMs: Build an AI Core Banking App with FastAPI and GPT-5 (In-Depth)

agimageai_downloaded_image_f071bc7664789724555458b7c220e31bbef74

⚠️ Scope & safety: This tutorial shows how to build a read-mostly AI assistant for core banking operations—customer education, statement explanation, account insights, anomaly triage, and guided workflows. It is not a funds-movement engine. Any write action (transfers, limit changes, KYC status updates) must be explicitly tool-gated, audited, and multi-party approved. Treat all examples as patterns; align with your organization’s infosec, risk, and regulatory teams (PCI DSS, GLBA, SOX, GDPR, etc.).


0) What You’ll Build

An AI operations layer on top of core banking systems using FastAPI + GPT-5 that:

  • Answers customer and back-office questions with policy-aware, citation-bearing responses (no hallucinated balances).

  • Provides explain-my-statement, fee breakdown, suspicious activity triage, cashflow summaries, regulatory FAQs.

  • Uses tool calling for read-only data (balances, transactions, KYC flags) and guarded write tools behind approvals (e.g., freeze card).

  • Enforces RBAC/ABAC, PII minimization & masking, idempotency, audit trails, and tamper-evident logs.

  • Ships with prompt contracts, context packs, validators, golden traces, CI pack replays, canary/rollback, $ / outcome dashboards.


1) High-Level Architecture

api/
├─ main.py                 # FastAPI app, routing, middleware
├─ auth.py                 # OAuth2/JWT, RBAC/ABAC enforcement
├─ models.py               # Pydantic request/response schemas
├─ llm_client.py           # GPT-5 client wrapper (timeouts, retries)
├─ contract.py             # Prompt contract(s) + output JSON schema
├─ tools/
│  ├─ readonly.py          # Balance, transactions, KYC, limits (GET only)
│  └─ guarded.py           # Freeze card, raise limit, close account (requires approvals)
├─ retrieval/
│  ├─ policy_index.py      # Curated KB (fees, SLA, disclosures) with eligibility filters
│  └─ cite.py              # Minimal-span citation builder
├─ validators.py           # Schema, citation, refusal, PII leakage checks
├─ observability.py        # Metrics, tracing, structured logs
├─ audit.py                # Append-only audit events (hash-chained)
├─ config.py               # Settings via env/secret manager
└─ tests/
   ├─ traces/*.json        # Golden traces (anonymized)
   └─ replay_test.py       # CI pack replays + gates

Data flow

  1. Auth (OAuth2/JWT) → RBAC/ABAC context (tenant, scopes, risk tier).

  2. Request enters router → optional policy-aware retrieval (fees, disclosures).

  3. Build Context Pack (customer-scoped facts + policy snippets + timestamps).

  4. GPT-5 called with a strict prompt contract (education & guided ops).

  5. If the model proposes a tool call, FastAPI validates preconditions and executes or queues approvals.

  6. Validators check schema, citations, PII minimization, refusal/abstention before responding.

  7. Observability + Audit record inputs/outputs (sanitized) and tool effects (hash-chained).


2) Security, Privacy, Compliance Posture (Non-Negotiables)

  • Least privilege: services get read scopes by default. Write tools require separate service account, MFA/4-eyes, and change tickets.

  • PII minimization: redact or hash identifiers (PAN, SSN) before model calls; never send raw card numbers.

  • Segmentation: model runtime isolated (VPC/priv subnet). No data leaves trust boundary without DLP checks.

  • Tamper-evident audit: hash-chain events; include request_id, actor, tool, before/after, approvals.

  • Idempotency keys on any write path.

  • Data retention: short TTL for prompts/completions; persistent storage only for audit metadata, not raw content.

  • Legal review: align with PCI DSS, GLBA, local data residency, and your regulator’s guidance.


3) Prompt Contract (Your Primary Control Surface)

Goal: Make the model policy-aware, source-bound, and abstaining when data is missing or out of scope.

# contract.py
SYSTEM_CONTRACT = """
You are a core-banking assistant for education and guided operations.
Use ONLY the provided context pack and tool outputs. Never invent balances or fees.
Policies:
- Rank evidence by retrieval_score; break ties by newest effective_date.
- Prefer primary sources: core ledger, fee schedule, disclosures.
- Quote minimal spans and include source_id for each cited fact.
- If required fields are missing (account_id, date_range, customer_consent), ask targeted follow-ups.
- If sources conflict, surface both with dates; do not harmonize.
- For write actions, propose a tool call with parameters ONLY; never confirm success without tool output.
Output JSON ONLY:
{"answer":"...", "citations":["..."], "followups":["..."], "proposed_tool":null|{"name":"...", "args":{...}}, "uncertainty":0.0}
"""

Keep this short, testable, and versioned (e.g., contract.banking.v1.5.0) with a changelog and CI gates.


4) Context Pack (Machine-Shaped Evidence)

# retrieval/cite.py (builder)
def make_context_pack(user, customer_ctx, kb_hits):
    # all fields tenant-scoped and timestamped
    return {
      "task": "fee_explanation",
      "customer": {"id": customer_ctx.id_hash, "segment": customer_ctx.segment},
      "claims": [
        {"id":"ledger:bal#2025-10-11","text":"Checking ending balance: $4,210.14","effective_date":"2025-10-11","tier":"primary"},
        {"id":"ledger:txn:987","text":"2025-10-03 ATM withdrawal $200 + $2.50 fee","effective_date":"2025-10-03","tier":"primary"},
        {"id":"kb:fees:atm#2025-07","text":"Out-of-network ATM fee is $2.50 per withdrawal.","effective_date":"2025-07-01","tier":"policy"}
      ],
      "hints":{"tie_break":"newest","max_tokens_ctx":1500}
    }

Atomic claims + timestamps + tiers produce deterministic, auditable reasoning.


5) Tooling: Read-Only Adapters & Guarded Writes

Tool schema (declarative)

# tools/readonly.py
from pydantic import BaseModel
class TxnQuery(BaseModel):
    account_id: str
    date_from: str
    date_to: str
    limit: int = 100

def get_transactions(args: TxnQuery, principal)->dict:
    # Enforce RBAC/ABAC here (principal scopes, account ownership)
    # Query read replica; return masked data
    ...

TOOL_REGISTRY = {
  "get_transactions": {"fn": get_transactions, "args_schema": TxnQuery, "write": False}
}
# tools/guarded.py
class FreezeCardArgs(BaseModel):
    card_token: str
    reason: str

def freeze_card(args: FreezeCardArgs, principal)->dict:
    # 1) Check scopes and risk tier
    # 2) Require approval ticket + 2nd approver
    # 3) Execute via core API with idempotency key
    # 4) Return outcome with event_id
    ...

GUARDED = {
  "freeze_card": {"fn": freeze_card, "args_schema": FreezeCardArgs, "write": True, "approvals": 2}
}

LLM tool proposal → FastAPI executor

The model proposes {"proposed_tool":{"name":"get_transactions","args":{...}}}; the API validates & executes, then returns the tool output back to the model (or directly to the client), never trusting text alone.


6) FastAPI Skeleton (Routers, Middleware, Dependencies)

# main.py
import uvicorn, json, uuid, time
from fastapi import FastAPI, Depends, HTTPException, Request
from fastapi.responses import JSONResponse
from models import AskRequest, AskResponse
from auth import principal_from_token, enforce_scopes
from llm_client import chat_json
from contract import SYSTEM_CONTRACT
from retrieval.policy_index import search_policy
from retrieval.cite import make_context_pack
from tools.readonly import TOOL_REGISTRY
from tools.guarded import GUARDED
from validators import validate_llm_response, mask_pii
from observability import metrics, log_event
from audit import append_audit

app = FastAPI(title="AI Core Banking Assistant", version="1.0.0")

@app.middleware("http")
async def add_request_id(request: Request, call_next):
    request.state.rid = rid = str(uuid.uuid4())
    start = time.time()
    resp = await call_next(request)
    metrics.observe_latency("http_ms", (time.time()-start)*1000, tags={"path":request.url.path})
    resp.headers["X-Request-Id"] = rid
    return resp

@app.post("/v1/ask", response_model=AskResponse)
async def ask(req: AskRequest, principal=Depends(principal_from_token)):
    enforce_scopes(principal, ["bank.read"])  # default read scope
    rid = principal.request_id

    # Optional policy retrieval (fee schedules, disclosures) filtered by region/segment
    kb_hits = search_policy(req.query, region=principal.region, language=principal.lang)
    pack = make_context_pack(req.query, principal.customer_ctx, kb_hits)

    # Build prompt
    messages = [
        {"role":"system","content": SYSTEM_CONTRACT},
        {"role":"user","content": json.dumps({"mode": req.mode, "context_pack": pack, "question": mask_pii(req.query)})}
    ]

    llm = await chat_json(messages, temperature=0.1, request_id=rid)
    ok, issues = validate_llm_response(llm, require_citations=bool(kb_hits))
    if not ok:
        log_event("llm_validation_failed", {"issues": issues, "rid": rid})
        raise HTTPException(422, "Unsafe or malformed model output.")

    # If a tool is proposed, execute safely
    tool = llm.get("proposed_tool")
    tool_result = None
    if tool:
        name = tool["name"]
        registry = {**TOOL_REGISTRY, **GUARDED}
        spec = registry.get(name)
        if not spec:
            raise HTTPException(400, f"Unknown tool {name}")
        # Enforce write safety
        if spec.get("write", False):
            enforce_scopes(principal, ["bank.write"])
            # (Approval flow, idempotency, and risk checks would run here)
        # Validate args via schema
        args = spec["args_schema"](**tool["args"])
        tool_result = spec["fn"](args, principal)

        append_audit(principal, action=name, args=args.model_dump(), result=tool_result)
        metrics.increment("tool_calls", tags={"name": name})

    metrics.increment("llm_ask_ok")
    return AskResponse(answer=llm["answer"], citations=llm.get("citations", []),
                       followups=llm.get("followups", []), uncertainty=llm.get("uncertainty", 0.3),
                       tool_result=tool_result)

Key points

  • RBAC/ABAC in dependency injection.

  • Mask PII before model calls.

  • Validators gate unsafe outputs.

  • Audit every tool effect with request_id.

  • Backpressure (not shown) for surge protection.


7) Schemas (Pydantic)

# models.py
from pydantic import BaseModel, Field
class AskRequest(BaseModel):
    mode: str = Field(examples=["explain_statement","fee_breakdown","anomaly_triage"])
    query: str = Field(min_length=3, max_length=2000)

class AskResponse(BaseModel):
    answer: str
    citations: list[str] = []
    followups: list[str] = []
    uncertainty: float = 0.0
    tool_result: dict | None = None

8) LLM Client Wrapper (Determinism & Resilience)

# llm_client.py
import asyncio, os, json
from openai import AsyncOpenAI

client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))

async def chat_json(messages, temperature=0.1, request_id:str=""):
    resp = await client.chat.completions.create(
        model=os.getenv("MODEL","gpt-5"),
        temperature=temperature,
        response_format={"type":"json_object"},
        messages=messages,
        extra_headers={"x-request-id": request_id}
    )
    content = resp.choices[0].message.content
    return json.loads(content)
  • Low temperature, JSON schema, timeouts/retries (add as needed).

  • Add circuit breakers and rate limiting in production.


9) Validators (Pre-Display Guardrails)

# validators.py
PROHIBITED = ["transfer completed", "I moved funds", "we changed your limit"]
def validate_llm_response(r:dict, require_citations:bool)->tuple[bool,list]:
    issues=[]
    for k in ["answer","uncertainty","followups"]:
        if k not in r: issues.append(f"missing:{k}")
    text = (r.get("answer","") + " ".join(r.get("followups",[]))).lower()
    if any(p in text for p in PROHIBITED):
        issues.append("implies_write_action_without_tool")
    if require_citations and not r.get("citations"):
        issues.append("missing_citations")
    u = r.get("uncertainty", 0.0)
    if not (0 <= u <= 1): issues.append("uncertainty_out_of_range")
    return (len(issues)==0, issues)

def mask_pii(s:str)->str:
    import re
    s = re.sub(r"\b\d{12,19}\b", "[PAN]", s)      # crude PAN mask
    s = re.sub(r"\b\d{3}-\d{2}-\d{4}\b", "[SSN]", s)
    return s

10) Policy-Aware Retrieval (Eligibility > Similarity)

  • Filters first: tenant, region, language, customer segment, product, effective_date window.

  • Tiering: primary (ledger, fee schedule) > secondary (help articles) > tertiary (blogs—often exclude).

  • Minimal-span quotes for citations.

  • No customer PII into the index; use surrogate keys.


11) Golden Traces & Pack Replays (CI Gates)

Example trace:

{
  "trace_id": "T-fee-007",
  "mode": "fee_breakdown",
  "question": "Why was I charged $2.50 on Oct 3?",
  "pack": {
    "claims": [
      {"id":"ledger:txn:987","text":"2025-10-03 ATM withdrawal $200 + $2.50 fee","effective_date":"2025-10-03","tier":"primary"},
      {"id":"kb:fees:atm#2025-07","text":"Out-of-network ATM fee is $2.50 per withdrawal.","effective_date":"2025-07-01","tier":"policy"}
    ]
  },
  "expectations": {
    "must_cite": ["kb:fees:atm#2025-07"],
    "must_not_write": true,
    "max_uncertainty": 0.6
  }
}

CI rule of thumb

  • Policy adherence ≥ 0.98

  • Citation precision/recall ≥ 0.90

  • p95 latency ≤ SLO (e.g., 1200 ms)

  • $/successful outcome ≤ budget

Fail → block merge. Canary new contract/prompt to 5–10% traffic; one-click rollback on adherence dip ≥2 points.


12) Cost & Latency Engineering

  • Token budgets per route (header ≤ 200, context ≤ 1200, gen ≤ 250).

  • Compression with guarantees: convert policy docs to atomic claims with URLs + dates.

  • Caching: template prompts, retrieval hits, deterministic responses (low temp).

  • Routing: small verifier/draft models for speculative decoding → 1.5–2.5Ă— speedups.

  • Outcomes dashboard: $/answer, adherence, citation P/R, p50/p95 latency, cache hit rate, escalation mix.


13) Observability & Audit

  • Trace IDs carried from API → LLM → tools.

  • Structured logs (json) with rid, principal_id, tool, lat_ms.

  • Metrics: counters (tool_calls, llm_ask_ok), histograms (latency), gauges ($/outcome).

  • Audit chain: hash(prev_hash + event_json); store in WORM/immutable store.


14) Deployment Notes

  • Secrets: via secret manager (not .env in prod).

  • Network: outbound egress restricted; private endpoints for core APIs.

  • Blue/green + feature flags: separate deploy from exposure.

  • Pen-test & threat modeling: include prompt injection, tool abuse, data exfil.


15) Example User Journeys

  1. Explain a fee → read transactions + fee schedule → cite + plain English explanation → offer receipt download (tool: read-only).

  2. Suspicious activity triage → highlight merchant/time/IP mismatch → propose “freeze_card” tool (queued for approval) → show status.

  3. Statement reconciliation → group by merchant/category → show anomalies vs. previous months → export CSV (read-only).

All three journeys avoid write actions by default and never claim success without tool result.


16) Common Failure Modes & Fixes

  • Hallucinated balances → never pass “whole internet”; only pass ledger claims; reject answers lacking primary citations.

  • Implicit writes in text → validator blocks; enforce tool-only writes + approvals.

  • Data leakage → aggressive masking + DLP; redact before LLM; no PAN/SSN in logs.

  • Latency spikes → trim context, cache retrieval, speculative decoding, cap generation.


17) 30-Day Rollout Plan

  • Week 1: Contracts v1; fee/FAQ KB; 30 golden traces; read-only tools wired.

  • Week 2: FastAPI MVP; validators; observability; CI pack replays.

  • Week 3: Canary to internal ops; collect traces; tune token budgets; add citations.

  • Week 4: Guarded writes (freeze card) behind approvals; $/outcome dashboards; external pilot for a small cohort.


Conclusion

An AI layer for core banking is not a bigger model—it’s governed computation. Pair a tight prompt contract with policy-aware retrieval, tool-gated actions, strict validators, and auditable logs. Keep PII minimized, writes behind approvals, and success tied to tool outputs, not prose. With FastAPI + GPT-5 and the discipline above, you can ship an assistant that is fast, cheap, explainable—and safe enough for real banking workflows.


Appendix: Minimal requirements.txt

fastapi>=0.115
uvicorn[standard]>=0.30
pydantic>=2.7
python-dotenv>=1.0
httpx>=0.27
openai>=1.40

Run locally

export OPENAI_API_KEY=sk-...
uvicorn api.main:app --reload