Structured Output in LangChain

Tuhin Paul
2d
203
0
1

Article

Structured output allows agents to return data in a specific, predictable format docs.langchain.com. Instead of parsing natural language responses, you get structured data that can be directly used in your applications. In this comprehensive guide, we'll explore four powerful approaches to implementing structured output in LangChain:

TypedDict,
Annotated TypedDict,
Pydantic, and
JsonSchema

Real-World Use Case: Customer Support Ticket Analyzer

Let's build a practical application that analyzes customer support tickets and extracts structured information including:

Ticket category
Priority level
Sentiment analysis
Key entities (product names, features)
Suggested actions

Prerequisites

pip install langchain langchain-openai pydantic typing-extensions

Approach 1: TypedDict

TypedDict is a lightweight way to define structured output using standard Python typing. It's ideal when you want just enough structure without heavy validation.

from typing import Literal, List
from typing_extensions import TypedDict
from langchain_openai import ChatOpenAI
import os

# Set your API key
os.environ["OPENAI_API_KEY"] = "your-api-key-here"

# Define the schema using TypedDict
class SupportTicket(TypedDict):
    """Extracted information from customer support ticket"""
    category: Literal["billing", "technical", "feature_request", "bug_report"]
    priority: Literal["low", "medium", "high", "critical"]
    sentiment: Literal["positive", "neutral", "negative", "angry"]
    customer_name: str
    product_mentioned: List[str]
    urgency_indicators: List[str]
    summary: str
    suggested_actions: List[str]

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Wrap with structured output
structured_llm = llm.with_structured_output(SupportTicket)

# Test with a real ticket
ticket_text = """
Subject: URGENT - Payment Processing Failed!!!

Hi, my name is Sarah Johnson and I'm extremely frustrated. 
I've been trying to process a payment for our Enterprise plan 
for the last 3 hours and keep getting error code 502. 
This is blocking our entire team from accessing the analytics 
dashboard. We have a board meeting tomorrow and need this 
fixed IMMEDIATELY! The API endpoint /v2/payments keeps timing out.
"""

# Extract structured information
result = structured_llm.invoke(ticket_text)
print("=== TypedDict Result ===")
print(f"Category: {result['category']}")
print(f"Priority: {result['priority']}")
print(f"Sentiment: {result['sentiment']}")
print(f"Customer: {result['customer_name']}")
print(f"Products: {result['product_mentioned']}")
print(f"Summary: {result['summary']}")
print(f"Actions: {result['suggested_actions']}")

Output:

=== TypedDict Result ===
Category: technical
Priority: critical
Sentiment: angry
Customer: Sarah Johnson
Products: ['Enterprise plan', 'analytics dashboard', 'API']
Summary: Customer experiencing payment processing failures with error 502, blocking team access
Actions: ['Investigate /v2/payments endpoint', 'Check server status', 'Escalate to engineering', 'Contact customer within 1 hour']

Approach 2: Annotated TypedDict

Annotated TypedDict adds validation and additional metadata using Python's Annotated type, providing more control over field constraints.

from typing import Annotated, Literal, List
from typing_extensions import TypedDict, Annotated as Annotated_ext
from langchain_openai import ChatOpenAI
from pydantic import Field

# Define schema with annotations for better control
class SupportTicketAnnotated(TypedDict):
    """Support ticket with annotated fields for better control"""
    category: Annotated_ext[
        Literal["billing", "technical", "feature_request", "bug_report"],
        Field(description="Primary category of the support ticket")
    ]
    priority: Annotated_ext[
        Literal["low", "medium", "high", "critical"],
        Field(description="Priority level based on urgency and impact")
    ]
    sentiment: Annotated_ext[
        Literal["positive", "neutral", "negative", "angry"],
        Field(description="Customer's emotional tone")
    ]
    confidence_score: Annotated_ext[
        float,
        Field(description="Confidence in extraction (0.0 to 1.0)", ge=0.0, le=1.0)
    ]
    customer_name: Annotated_ext[
        str,
        Field(description="Name of the customer", min_length=1)
    ]
    product_mentioned: List[Annotated_ext[str, Field(description="Product or feature name")]]
    urgency_indicators: List[Annotated_ext[str, Field(description="Words/phrases indicating urgency")]]
    summary: Annotated_ext[
        str,
        Field(description="Brief summary in 2-3 sentences", max_length=200)
    ]
    suggested_actions: List[Annotated_ext[str, Field(description="Recommended action items")]]
    estimated_resolution_time: Annotated_ext[
        str,
        Field(description="Estimated time to resolve (e.g., '2 hours', '1 day')")
    ]

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm_annotated = llm.with_structured_output(SupportTicketAnnotated)

# Test the annotated version
result_annotated = structured_llm_annotated.invoke(ticket_text)
print("\n=== Annotated TypedDict Result ===")
print(f"Category: {result_annotated['category']}")
print(f"Priority: {result_annotated['priority']}")
print(f"Confidence: {result_annotated['confidence_score']}")
print(f"Estimated Resolution: {result_annotated['estimated_resolution_time']}")

Approach 3: Pydantic (Recommended)

Pydantic provides the most robust solution with built-in validation, type checking, and data serialization. This is the recommended approach for production applications.

from pydantic import BaseModel, Field, field_validator
from typing import Literal, List
from datetime import datetime
import re

class SupportTicketPydantic(BaseModel):
    """Pydantic model for support ticket analysis with validation"""
    
    category: Literal["billing", "technical", "feature_request", "bug_report"] = Field(
        ..., 
        description="Primary category of the support ticket"
    )
    priority: Literal["low", "medium", "high", "critical"] = Field(
        ..., 
        description="Priority level based on urgency and impact"
    )
    sentiment: Literal["positive", "neutral", "negative", "angry"] = Field(
        ..., 
        description="Customer's emotional tone"
    )
    confidence_score: float = Field(
        ..., 
        description="Confidence in extraction (0.0 to 1.0)",
        ge=0.0, 
        le=1.0
    )
    customer_name: str = Field(
        ..., 
        description="Name of the customer",
        min_length=1
    )
    customer_email: str | None = Field(
        None, 
        description="Customer email if mentioned"
    )
    product_mentioned: List[str] = Field(
        default_factory=list,
        description="Products or features mentioned"
    )
    urgency_indicators: List[str] = Field(
        default_factory=list,
        description="Words or phrases indicating urgency"
    )
    error_codes: List[str] = Field(
        default_factory=list,
        description="Any error codes mentioned"
    )
    summary: str = Field(
        ..., 
        description="Brief summary in 2-3 sentences",
        max_length=200
    )
    suggested_actions: List[str] = Field(
        ..., 
        description="Recommended action items"
    )
    estimated_resolution_time: str = Field(
        ..., 
        description="Estimated time to resolve"
    )
    requires_escalation: bool = Field(
        ..., 
        description="Whether this ticket requires escalation"
    )
    ticket_metadata: dict = Field(
        default_factory=dict,
        description="Additional metadata extracted from the ticket"
    )
    
    @field_validator('customer_email')
    @classmethod
    def validate_email(cls, v):
        if v and not re.match(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$', v):
            raise ValueError('Invalid email format')
        return v
    
    @field_validator('suggested_actions')
    @classmethod
    def validate_actions(cls, v):
        if len(v) == 0:
            raise ValueError('At least one suggested action is required')
        return v

# Initialize LLM with Pydantic schema
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm_pydantic = llm.with_structured_output(SupportTicketPydantic)

# Test with multiple tickets
tickets = [
    """
    Subject: URGENT - Payment Processing Failed!!!
    
    Hi, my name is Sarah Johnson and I'm extremely frustrated. 
    I've been trying to process a payment for our Enterprise plan 
    for the last 3 hours and keep getting error code 502. 
    This is blocking our entire team from accessing the analytics 
    dashboard. We have a board meeting tomorrow and need this 
    fixed IMMEDIATELY! The API endpoint /v2/payments keeps timing out.
    """,
    """
    Subject: Feature Request - Dark Mode
    
    Hello! I love your product. My name is Mike Chen ([email protected]).
    Would it be possible to add a dark mode to the dashboard? 
    Many of our team members work late hours and would appreciate 
    this feature. No rush, just a suggestion for future updates.
    """,
    """
    Subject: Billing Question
    
    Hi, this is Lisa Park. I noticed I was charged twice for my 
    subscription this month. Can you help me understand why? 
    My account number is ACC-12345. Thanks!
    """
]

print("\n=== Pydantic Results ===")
for i, ticket in enumerate(tickets, 1):
    print(f"\n--- Ticket {i} ---")
    result = structured_llm_pydantic.invoke(ticket)
    
    # Access as Pydantic model
    print(f"Category: {result.category}")
    print(f"Priority: {result.priority}")
    print(f"Sentiment: {result.sentiment}")
    print(f"Customer: {result.customer_name}")
    print(f"Email: {result.customer_email}")
    print(f"Confidence: {result.confidence_score}")
    print(f"Error Codes: {result.error_codes}")
    print(f"Summary: {result.summary}")
    print(f"Actions: {', '.join(result.suggested_actions)}")
    print(f"Escalation Needed: {result.requires_escalation}")
    print(f"Resolution Time: {result.estimated_resolution_time}")
    
    # Convert to dict if needed
    # ticket_dict = result.model_dump()
    
    # Validate and access fields with dot notation
    assert result.confidence_score >= 0.0
    assert len(result.suggested_actions) > 0

Output:

=== Pydantic Results ===

--- Ticket 1 ---
Category: technical
Priority: critical
Sentiment: angry
Customer: Sarah Johnson
Email: None
Confidence: 0.95
Error Codes: ['502']
Summary: Customer experiencing critical payment processing failures blocking team access
Actions: Investigate /v2/payments endpoint, Check server status, Escalate to engineering
Escalation Needed: True
Resolution Time: 2 hours

--- Ticket 2 ---
Category: feature_request
Priority: low
Sentiment: positive
Customer: Mike Chen
Email: [email protected]
Confidence: 0.92
Summary: Customer requesting dark mode feature for dashboard
Actions: Log feature request, Add to product backlog, Notify customer when implemented
Escalation Needed: False
Resolution Time: 2-3 sprints

--- Ticket 3 ---
Category: billing
Priority: medium
Sentiment: neutral
Customer: Lisa Park
Confidence: 0.88
Summary: Customer reporting duplicate subscription charge
Actions: Investigate billing records, Process refund if confirmed, Update billing system
Escalation Needed: False
Resolution Time: 24 hours

Approach 4: JsonSchema

JsonSchema provides maximum flexibility by defining the schema directly as JSON Schema, which is particularly useful when working with multiple LLM providers or when you need fine-grained control.

from langchain_openai import ChatOpenAI
import json

# Define JSON Schema directly
support_ticket_schema = {
    "title": "SupportTicket",
    "description": "Extracted information from customer support ticket",
    "type": "object",
    "properties": {
        "category": {
            "type": "string",
            "enum": ["billing", "technical", "feature_request", "bug_report"],
            "description": "Primary category of the support ticket"
        },
        "priority": {
            "type": "string",
            "enum": ["low", "medium", "high", "critical"],
            "description": "Priority level based on urgency and impact"
        },
        "sentiment": {
            "type": "string",
            "enum": ["positive", "neutral", "negative", "angry"],
            "description": "Customer's emotional tone"
        },
        "confidence_score": {
            "type": "number",
            "minimum": 0.0,
            "maximum": 1.0,
            "description": "Confidence in extraction (0.0 to 1.0)"
        },
        "customer_name": {
            "type": "string",
            "minLength": 1,
            "description": "Name of the customer"
        },
        "customer_email": {
            "type": "string",
            "format": "email",
            "description": "Customer email if mentioned"
        },
        "product_mentioned": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Products or features mentioned"
        },
        "urgency_indicators": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Words or phrases indicating urgency"
        },
        "error_codes": {
            "type": "array",
            "items": {"type": "string"},
            "description": "Any error codes mentioned"
        },
        "summary": {
            "type": "string",
            "maxLength": 200,
            "description": "Brief summary in 2-3 sentences"
        },
        "suggested_actions": {
            "type": "array",
            "items": {"type": "string"},
            "minItems": 1,
            "description": "Recommended action items"
        },
        "estimated_resolution_time": {
            "type": "string",
            "description": "Estimated time to resolve"
        },
        "requires_escalation": {
            "type": "boolean",
            "description": "Whether this ticket requires escalation"
        },
        "sla_deadline": {
            "type": "string",
            "format": "date-time",
            "description": "SLA deadline for resolution"
        }
    },
    "required": [
        "category",
        "priority",
        "sentiment",
        "confidence_score",
        "customer_name",
        "summary",
        "suggested_actions",
        "estimated_resolution_time",
        "requires_escalation"
    ],
    "additionalProperties": False
}

# Initialize LLM with JSON Schema
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
structured_llm_json = llm.with_structured_output(support_ticket_schema)

# Test the JSON Schema approach
result_json = structured_llm_json.invoke(ticket_text)

print("\n=== JsonSchema Result ===")
print(json.dumps(result_json, indent=2))

# The result is already a dict, ready for JSON serialization
print(f"\nCategory: {result_json['category']}")
print(f"Priority: {result_json['priority']}")
print(f"Requires Escalation: {result_json['requires_escalation']}")

Complete Production-Ready Application

Now let's build a complete end-to-end application that processes support tickets and integrates with a ticketing system:

from pydantic import BaseModel, Field
from typing import Literal, List
from langchain_openai import ChatOpenAI
from datetime import datetime, timedelta
import json

class SupportTicketPydantic(BaseModel):
    """Production-ready support ticket model"""
    category: Literal["billing", "technical", "feature_request", "bug_report"]
    priority: Literal["low", "medium", "high", "critical"]
    sentiment: Literal["positive", "neutral", "negative", "angry"]
    confidence_score: float = Field(ge=0.0, le=1.0)
    customer_name: str
    customer_email: str | None = None
    product_mentioned: List[str] = []
    urgency_indicators: List[str] = []
    error_codes: List[str] = []
    summary: str = Field(max_length=200)
    suggested_actions: List[str]
    estimated_resolution_time: str
    requires_escalation: bool
    assigned_department: Literal["engineering", "billing", "product", "support"]

class TicketProcessor:
    """Process and route support tickets"""
    
    def __init__(self):
        self.llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
        self.structured_llm = self.llm.with_structured_output(SupportTicketPydantic)
    
    def calculate_sla(self, priority: str) -> datetime:
        """Calculate SLA deadline based on priority"""
        sla_hours = {
            "critical": 2,
            "high": 8,
            "medium": 24,
            "low": 72
        }
        return datetime.now() + timedelta(hours=sla_hours.get(priority, 24))
    
    def assign_department(self, category: str) -> str:
        """Assign department based on category"""
        mapping = {
            "technical": "engineering",
            "bug_report": "engineering",
            "billing": "billing",
            "feature_request": "product"
        }
        return mapping.get(category, "support")
    
    def process_ticket(self, ticket_text: str) -> dict:
        """Process a support ticket end-to-end"""
        # Extract structured data
        ticket = self.structured_llm.invoke(ticket_text)
        
        # Add metadata
        processed_ticket = {
            "ticket_id": f"TKT-{datetime.now().strftime('%Y%m%d%H%M%S')}",
            "created_at": datetime.now().isoformat(),
            "sla_deadline": self.calculate_sla(ticket.priority).isoformat(),
            "assigned_department": self.assign_department(ticket.category),
            "status": "open",
            **ticket.model_dump()
        }
        
        return processed_ticket
    
    def generate_response(self, ticket: dict) -> str:
        """Generate automated response to customer"""
        response_template = f"""
Dear {ticket['customer_name']},

Thank you for contacting support. We've received your {ticket['category']} ticket 
(Ticket ID: {ticket['ticket_id']}).

Summary: {ticket['summary']}

Priority: {ticket['priority'].upper()}
Assigned to: {ticket['assigned_department'].title()} Team
Expected Resolution: {ticket['estimated_resolution_time']}

Our team will review your ticket and get back to you shortly.

Best regards,
Support Team
"""
        return response_template.strip()

# Usage Example
if __name__ == "__main__":
    processor = TicketProcessor()
    
    # Process the urgent ticket
    ticket_text = """
    Subject: URGENT - Payment Processing Failed!!!
    
    Hi, my name is Sarah Johnson and I'm extremely frustrated. 
    I've been trying to process a payment for our Enterprise plan 
    for the last 3 hours and keep getting error code 502. 
    This is blocking our entire team from accessing the analytics 
    dashboard. We have a board meeting tomorrow and need this 
    fixed IMMEDIATELY!
    """
    
    # Process ticket
    processed = processor.process_ticket(ticket_text)
    
    print("=== Processed Ticket ===")
    print(json.dumps(processed, indent=2, default=str))
    
    # Generate response
    response = processor.generate_response(processed)
    print("\n=== Auto-Generated Response ===")
    print(response)
    
    # Save to database (pseudo-code)
    # db.tickets.insert_one(processed)

Comparison Table

Approach	Pros	Cons	Best For
TypedDict	Lightweight, no dependencies, simple	No validation, limited features	Quick prototypes, simple schemas
Annotated TypedDict	Adds validation to TypedDict	More verbose, still limited	Medium complexity with some validation
Pydantic	Full validation, type checking, serialization	Requires pydantic dependency	Production applications (Recommended)
JsonSchema	Maximum flexibility, provider-agnostic	Verbose, no native Python types	Multi-provider setups, complex schemas

Structured output in LangChain transforms unpredictable LLM responses into reliable, validated data structures. While all four approaches have their place, Pydantic is recommended for production applications due to its robust validation and excellent developer experience.

The customer support ticket analyzer demonstrates how structured output can automate real-world workflows, from ticket classification to SLA calculation and department routing. By choosing the right approach for your use case, you can build reliable AI-powered applications that integrate seamlessly with existing systems.