AI Agents in Practice: Automated Document Filing and Classification Agent (Prompts + Code)

John Godel
Oct 20
1.9k
0
2

Article

Introduction

In this article, we introduce the Automated Document Filing and Classification Agent. This agent is responsible for organizing, categorizing, and storing large volumes of documents in a structured manner. It reads incoming documents, extracts relevant metadata (e.g., document type, date, author), and places them in the appropriate folders or categories. The agent ensures that documents are filed according to company standards and compliance requirements, and it can automatically flag or remove irrelevant files. As with all previous agents, the process is fully tracked, and receipts are issued for each action completed.

The Use Case

In many organizations, document management can be a tedious and time-consuming task. The agent simplifies this process by automatically classifying documents (e.g., contracts, reports, invoices), storing them in predefined folders, and ensuring that documents meet compliance requirements such as data privacy laws. The agent can also handle sensitive documents, ensuring they are stored in secure, encrypted folders and flagged for review if necessary. By automating document management, organizations can save time, reduce human error, and improve their overall data governance.

Prompt Contract (agent interface)

# file: contracts/document_classification_v1.yaml
role: "DocumentFilingAgent"
scope: >
  Read documents, classify them by type, and store them in appropriate categories or folders.
  Ask once for missing fields (document_id, document_type, date, author, content).
  Propose tool calls; never assert success without a receipt.
output:
  type: object
  required: [summary, decision, document_classification, citations, next_steps, tool_proposals]
  properties:
    summary: {type: string, maxWords: 100}
    decision: {type: string, enum: ["approve", "reject", "need_approval", "need_more_info"]}
    document_classification:
      type: object
      required: [document_id, document_type, storage_location, classification_status]
      properties:
        document_id: {type: string}
        document_type: {type: string, enum: ["contract", "invoice", "report", "email", "memo"]}
        storage_location: {type: string}
        classification_status: {type: string, enum: ["classified", "unclassified"]}
    citations: {type: array, items: {type: string}}
    next_steps: {type: array, items: {type: string}, maxItems: 6}
    tool_proposals:
      type: array
      items:
        type: object
        required: [name, args, preconditions, idempotency_key]
        properties:
          name: {type: string, enum: [ClassifyDocument, StoreDocument, RequestReview, VerifyCompliance]}
          args: {type: object}
          preconditions: {type: string}
          idempotency_key: {type: string}
policy_id: "document_management_policy.v3"
citation_rule: "1–2 minimal-span claim_ids per factual sentence"
decoding:
  narrative: {top_p: 0.92, temperature: 0.72, stop: ["\n\n## "]}
  bullets:   {top_p: 0.82, temperature: 0.45}

Example claims (context provided to the model)

[
  {"claim_id":"policy:document:confidentiality","text":"Confidential documents must be stored in encrypted folders.",
   "effective_date":"2025-01-01","source_id":"doc:document_management_policy_v3","span":"stored in encrypted folders"},
  {"claim_id":"policy:document:classification","text":"Documents must be classified by type (e.g., contract, invoice, report).",
   "effective_date":"2025-01-01","source_id":"doc:document_management_policy_v3","span":"classified by type"},
  {"claim_id":"policy:document:retention","text":"Documents must be retained for a minimum of 7 years before deletion.",
   "effective_date":"2025-01-01","source_id":"doc:document_management_policy_v3","span":"retained for a minimum of 7 years"}
]

Tool Interfaces (typed, with receipts)

# tools.py
from pydantic import BaseModel
from typing import Optional, List, Dict
from datetime import datetime

class ClassifyDocumentArgs(BaseModel):
    document_id: str
    document_type: str
    content: str

class StoreDocumentArgs(BaseModel):
    document_id: str
    storage_location: str
    document_type: str

class RequestReviewArgs(BaseModel):
    document_id: str
    review_needed: bool

class VerifyComplianceArgs(BaseModel):
    document_id: str
    document_type: str
    compliance_required: bool

class ToolReceipt(BaseModel):
    tool: str
    ok: bool
    ref: str
    message: str = ""
    data: Optional[Dict] = None

# adapters.py  (demo logic)
from tools import *
from datetime import datetime

DOCUMENTS = {
    "doc-001": {"document_type": "contract", "content": "Contract terms and conditions for vendor XYZ", "classified": False},
    "doc-002": {"document_type": "invoice", "content": "Invoice for order 12345", "classified": False}
}

STORAGE_LOCATIONS = {
    "contract": "contract_folder/secure",
    "invoice": "invoice_folder/general"
}

def classify_document(a: ClassifyDocumentArgs) -> ToolReceipt:
    document = DOCUMENTS.get(a.document_id)
    if not document:
        return ToolReceipt(tool="ClassifyDocument", ok=False, ref="document-not-found", message="Document not found")
    document["document_type"] = a.document_type
    return ToolReceipt(tool="ClassifyDocument", ok=True, ref=f"classify-{a.document_id}",
                       message="Document classified", data={"document_type": a.document_type})

def store_document(a: StoreDocumentArgs) -> ToolReceipt:
    location = STORAGE_LOCATIONS.get(a.document_type)
    if not location:
        return ToolReceipt(tool="StoreDocument", ok=False, ref="location-not-found", message="Storage location not found")
    return ToolReceipt(tool="StoreDocument", ok=True, ref=f"store-{a.document_id}",
                       message="Document stored", data={"storage_location": location})

def request_review(a: RequestReviewArgs) -> ToolReceipt:
    return ToolReceipt(tool="RequestReview", ok=True, ref=f"review-{a.document_id}",
                       message=f"Review requested for {a.document_id}", data={"review_needed": a.review_needed})

def verify_compliance(a: VerifyComplianceArgs) -> ToolReceipt:
    if not a.compliance_required:
        return ToolReceipt(tool="VerifyCompliance", ok=False, ref="compliance-fail", message="Document does not meet compliance")
    return ToolReceipt(tool="VerifyCompliance", ok=True, ref=f"compliance-{a.document_id}", message="Document compliant")

Agent Loop (proposal → verification → execution → receipts)

# agent_document_filing.py
import uuid, json
from typing import Any, Dict, List
from tools import *
from adapters import *

ALLOWED_TOOLS = {"ClassifyDocument", "StoreDocument", "RequestReview", "VerifyCompliance"}

def new_idem() -> str:
    return f"idem-{uuid.uuid4()}"

def verify_proposal(p: Dict[str, Any]) -> str:
    required = {"name","args","preconditions","idempotency_key"}
    if not required.issubset(p): return "Missing proposal fields"
    if p["name"] not in ALLOWED_TOOLS: return "Tool not allowed"
    return ""

def execute(p: Dict[str, Any]) -> ToolReceipt:
    n, a = p["name"], p["args"]
    if n == "ClassifyDocument":      return classify_document(ClassifyDocumentArgs(**a))
    if n == "StoreDocument":         return store_document(StoreDocumentArgs(**a))
    if n == "RequestReview":         return request_review(RequestReviewArgs(**a))
    if n == "VerifyCompliance":      return verify_compliance(VerifyComplianceArgs(**a))
    return ToolReceipt(tool=n, ok=False, ref="none", message="Unknown tool")

# --- Model shim returning a plan per contract (replace with your LLM call) ---
def call_model(contract_yaml: str, claims: List[Dict[str,Any]], document_data: Dict[str,Any]) -> Dict[str,Any]:
    decision = "approve" if document_data["classified"] else "request_review"
    return {
      "summary": f"Document {document_data['document_id']} review complete.",
      "decision": decision,
      "document_classification": {
        "document_id": document_data["document_id"],
        "document_type": document_data["document_type"],
        "storage_location": STORAGE_LOCATIONS[document_data["document_type"]],
        "classification_status": "classified"
      },
      "citations": ["policy:document:confidentiality","policy:document:classification","policy:document:retention"],
      "next_steps": ["Classify document", "Store document", "Request review", "Verify compliance"],
      "tool_proposals": [
        {"name":"ClassifyDocument","args":{"document_id":document_data["document_id"], "document_type":document_data["document_type"], "content":document_data["content"]},
         "preconditions":"Classify document by type.","idempotency_key": new_idem()},
        {"name":"StoreDocument","args":{"document_id":document_data["document_id"], "storage_location":STORAGE_LOCATIONS[document_data["document_type"]],
                                        "document_type":document_data["document_type"]},
         "preconditions":"Store classified document in the correct location.","idempotency_key": new_idem()},
        {"name":"RequestReview","args":{"document_id":document_data["document_id"], "review_needed":True},
         "preconditions":"Request review for compliance.","idempotency_key": new_idem()},
        {"name":"VerifyCompliance","args":{"document_id":document_data["document_id"], "document_type":document_data["document_type"], "compliance_required":True},
         "preconditions":"Ensure document compliance with company policy.","idempotency_key": new_idem()}
      ]
    }

def render_response(model_json: Dict[str,Any], receipts: List[ToolReceipt]) -> str:
    idx = {r.tool:r for r in receipts}
    lines = [model_json["summary"], ""]
    lines.append(f"Decision: {model_json['decision']}")
    lines.append(f"Document Classification: {model_json['document_classification']}")
    lines.append("")
    lines.append("Next steps:")
    for s in model_json["next_steps"]:
        lines.append(f"• {s}")
    if idx.get("StoreDocument") and idx["StoreDocument"].ok:
        lines.append(f"\nDocument stored: {idx['StoreDocument'].message}")
    if idx.get("RequestReview") and idx["RequestReview"].ok:
        lines.append(f"Review requested: {idx['RequestReview'].message}")
    lines.append("\nCitations: " + ", ".join(model_json["citations"]))
    return "\n".join(lines)

def handle(document_data: Dict[str,Any]) -> str:
    contract = open("contracts/document_classification_v1.yaml").read()
    claims: List[Dict[str,Any]] = []  # load real claims
    plan = call_model(contract, claims, document_data)

    receipts: List[ToolReceipt] = []
    for prop in plan["tool_proposals"]:
        reason = verify_proposal(prop)
        if reason:
            receipts.append(ToolReceipt(tool=prop["name"], ok=False, ref="blocked", message=reason)); continue
        rec = execute(prop)
        receipts.append(rec)
        if not rec.ok and prop["name"] in {"StoreDocument"}:
            break
    return render_response(plan, receipts)

if __name__ == "__main__":
    example_document_data = {
      "document_id":"doc-001",
      "document_type":"contract",
      "content":"This is a contract for vendor services.",
      "classified": False
    }
    print(handle(example_document_data))

The Prompt You’d Send to the Model (concise and testable)

System:
You are DocumentFilingAgent. Follow the contract:
- Ask once if document_id, document_type, content, or storage_location is missing.
- Cite 1–2 claim_ids per factual sentence using provided claims.
- Propose tools; never assert success without a receipt.
- Output JSON with keys: summary, decision, document_classification, citations[], next_steps[], tool_proposals[].

Claims (eligible only):
[ ... JSON array of document management policy claims like above ... ]

User:
File and classify this document:
{"document_id":"doc-001","document_type":"contract","content":"This is a contract for vendor services."}

How to adapt quickly

Integrate document classification, storage management, and compliance checks with your internal document management system or cloud storage solution. Implement idempotency and transactional integrity to ensure that documents are classified and stored correctly without duplication. Load claims from internal classification policies, retention guidelines, and compliance rules for data privacy. Ship the contract, policy bundle, and decoder settings behind a feature flag for easy testing, canary deployments, and rollback support.