Introduction
Most “AI agent” demos fall apart in production not because the model is weak, but because the data layer is. Agents need stable schemas, trustworthy metrics, and predictable access paths to act reliably. This article shows how to make agents first-class citizens of your data layers—whether that means a semantic model (dbt/Looker/Fabric), a feature store, an event bus, or a vector index—so they can find, reason, and execute with receipts instead of guesswork.
What we mean by “data layer” for agents
In practice an agent touches four strata:
Operational sources: apps/ERPs/CRMs and event streams where facts originate.
Transformation & governance: pipelines, lake/warehouse, lineage, quality checks, PII handling.
Semantic/feature layers: named metrics (e.g., RevenueYTD
), dimensions/joins, features (days_since_last_login
), and role-based scoping.
Serving layers: SQL endpoints, metrics APIs, feature stores, vector search, and action APIs (payments, tickets).
Agents shouldn’t reverse-engineer any of this; they should contract with it.
Core pattern: Contract, then Query, then Act
Contract: The agent negotiates what it needs (entities, measures, features, freshness, limits) and how it may access them (roles, labels).
Query: It uses the sanctioned interface (SQL/metrics API/feature store/vector index) and returns minimal-span citations to the fields/measures used.
Act: It performs an allowed action and returns a receipt (ID from the downstream system). No receipt → no success claim.
A concise agent ↔ data layer contract (YAML)
role: "KPIInsightsAgent"
scope: >
Answer KPI questions and propose actions using only certified metrics and features.
Respect row-level security (RLS) and sensitivity labels. Never expose raw PII.
inputs:
question: string
audience_group: string # e.g., "sales-emea"
requirements:
freshness_minutes_max: 180
metrics: [ "RevenueYTD", "PipelineSlippagePct", "WAU" ]
features_optional: [ "days_since_last_login", "is_churn_risk" ]
governance:
allowed_domains: [ "sales", "product" ]
sensitivity_max: "Confidential"
rls_impersonate_as: "${audience_group}"
output:
type: object
required: [summary, used_metrics, sql_refs, feature_refs, citations, proposed_actions]
properties:
summary: {type: string, maxWords: 120}
used_metrics: {type: array, items: string}
sql_refs: {type: array, items: {dataset: string, query_id: string}}
feature_refs: {type: array, items: {store: string, feature: string, version: string}}
citations: {type: array, items: string} # metric/feature doc IDs
proposed_actions: {type: array, items: string} # e.g., "OpenRenewalTask(Account X)"
Wiring common data layers
1) Semantic metric layer (dbt/Looker/Fabric/MetricFlow)
Treat the semantic model as the single source of truth. The agent fetches metric definitions (description, grain, filters, owners) and queries metrics via the official API (not ad-hoc SQL).
Require endorsement flags (certified/promoted) and freshness metadata; the agent ranks candidates by certification → lineage health → recency.
Return citations like metric:RevenueYTD@v3
and the bookmark/filter state used.
2) Feature store (Feast/Tecton/SageMaker/Vertex)
Agents don’t compute features; they read versioned features with point-in-time correctness.
Stick to entity keys and feature vectors documented in the store; every read returns (feature, version, timestamp)
.
3) Vector/knowledge layer
Use embeddings only for discovery and disambiguation; promote any numeric result to a metric lookup in the semantic layer before making a claim.
Attach chunk-level citations and access checks (labels/RLS) before disclosure.
4) Action layer (tickets, payments, emails)
Model actions as typed tools and require idempotency keys.
Success is a receipt: ticket ID, payment ID, commit SHA. Log the receipt with the input fingerprint.
Example: answering a revenue question with metrics + features
# pseudocode: agent uses sanctioned clients, not raw HTTP
metrics = MetricsClient() # semantic layer
features = FeatureStore() # feature layer
tickets = TicketingClient() # action layer
q = "Why did EMEA revenue dip last month and what should we do?"
ctx = {"group":"sales-emea"}
# 1) Discover certified metrics and pull with RLS impersonation
m_revenue = metrics.get("RevenueYTD", certified=True)
m_slippage = metrics.get("PipelineSlippagePct", certified=True)
series = metrics.query(
measures=[m_revenue, m_slippage],
dims=["Region","Month"],
filters=[("Region","EMEA")],
rls_group=ctx["group"],
freshness_minutes_max=180
)
# 2) Enrich with churn-risk feature for top accounts (optional)
top_accounts = pick_accounts(series, region="EMEA")
fv = features.read(entity="account_id", keys=top_accounts,
features=["is_churn_risk","days_since_last_login"], version="v5")
# 3) Synthesize narrative with minimal-span citations
narr = synthesize(series, fv, cite=["metric:RevenueYTD@v3","feature:is_churn_risk@v5"])
# 4) Propose and execute actions (with receipts)
if high_risk_slice(fv):
r = tickets.create_task(
summary="Recovery plan for top at-risk EMEA accounts",
labels=["revenue","emea"],
due_days=5,
idempotency_key=hash(tuple(top_accounts))
)
receipt = r.id # required to claim success
Retrieval guardrails that make agents trustworthy
RLS impersonation: Queries run as the recipient’s role; previews that cannot impersonate return a redaction notice.
Sensitivity ceilings: If any upstream label exceeds the contract’s ceiling, the agent summarizes qualitatively and refuses to show raw values.
Freshness SLOs: The agent declares the data age (e.g., “Refreshed 74 minutes ago”) and refuses to act if outside SLO.
Metric lineage health: Broken lineage or failing tests → fallback to a certified alternative or return a safe error with owner mention.
Production checklist (short and decisive)
Contracts in code: Store your agent contracts next to the metric/feature specs; validate them in CI.
Golden traces: Curate 10–20 representative questions + expected metric/feature calls + acceptable narratives. Run in CI/CD.
Receipts everywhere: No silent writes; every action must return an external ID saved to the trace.
Observability: Log metric/feature names, versions, filters, freshness, and citations for each response.
Change safety: Feature flags for new metrics/features; canary and rollback paths.
Common failure modes (and how to dodge them)
Hallucinated joins: Force agents to use the semantic layer—not ad-hoc SQL—so join logic is authored once.
Feature leakage in training: For any learn-or-adapt loop, use point-in-time feature materialization and event time windows.
Vector-only decisions: Use retrieval to find sources, then bind to metrics/features before making claims or acting.
PII exposure via context windows: Mask at the transformation layer; never rely on prompts to prevent leakage.
Conclusion
Agents become reliable when the data layer is a contract, not a suggestion. Give them certified metrics, versioned features, governed retrieval, and typed actions with receipts. In return you get assistants that answer with traceable facts, propose actions tied to real systems, and hold up under audit—exactly what production needs.