![GSCP]()
Abstract
This paper proposes a new architecture for deploying Private Tailored Small Language Models (PT-SLMs) in a cognitive cascade, orchestrated by Gödel’s Scaffolded Cognitive Prompting (GSCP) framework. PT-SLMs, each optimized for a specific reasoning subtask, are chained in a structured sequence that enables compositional cognition, factual accuracy, and task-specific efficiency. Our real-world application focuses on parsing financial documents from banking institutions, where the architecture demonstrated superior performance over conventional LLM-based solutions in terms of speed, cost, and hallucination rate.
By modularizing the NLP pipeline with targeted micro-models and leveraging GSCP for coordination and error recovery, we present a viable, scalable, and enterprise-friendly alternative to single-model systems. The approach is applicable across high-stakes domains, including finance, law, and healthcare.
1. Introduction
The increasing demand for reliable, interpretable, and cost-effective NLP systems in enterprise settings has shifted focus from all-encompassing LLMs toward leaner, task-aligned architectures. While monolithic models like GPT-4 offer impressive capabilities, they fall short in domains that require strict factual integrity, contextual transparency, and data privacy, especially within regulated industries such as finance.
In this context, we introduce the concept of Private Tailored Small Language Models (PT-SLMs): compact models trained or fine-tuned for specialized cognitive roles. Rather than relying on a single model to handle all stages of document parsing or comprehension, we segment the task into smaller, interdependent reasoning modules. These are orchestrated via Gödel’s Scaffolded Cognitive Prompting (GSCP), a layered prompting and context management framework that mimics hierarchical human cognition. Together, these form a cascade that is not only performant but also easier to audit, scale, and control.
2. Related Work
The move toward smaller and more focused language models is grounded in several research efforts. Work on adapter modules, LoRA (Low-Rank Adaptation), and parameter-efficient transfer learning techniques have shown that performance can be preserved or even improved when specialization replaces generalization. Moreover, recent interest in multi-agent orchestration has sparked exploration into model composition strategies.
Despite these advances, current implementations typically treat model chaining as an ad hoc sequence rather than a coordinated cognitive system. Existing multi-model pipelines often lack structured context sharing, role reasoning, and error propagation mechanisms. These limitations prevent the full realization of cognitive model architectures in production.
Our work builds on this foundation by introducing a cognitively scaffolded model chain, where each PT-SLM is part of a directed reasoning graph, governed by GSCP policies. This enables deterministic planning, feedback-based control, and seamless integration of domain constraints.
3. Architecture Overview
To address the limitations of traditional pipelines, we propose a three-tier cascade architecture composed of specialized PT-SLMs that communicate via context tokens, embeddings, and memory buffers. Unlike LLM pipelines that rely on a single model interpreting all inputs, our structure enforces task boundaries while preserving semantic continuity.
3.1 Private Tailored Small Language Models (PT-SLM)
PT-SLMs are purpose-built language models (<300M parameters) that are either distilled or fine-tuned for a single type of operation such as classification, extraction, reasoning, or summarization. These models are deployed within secure, localized environments and trained on domain-specific corpora such as internal audit logs, term sheets, amortization schedules, and more.
Each PT-SLM is optimized for.
- Memory and inference efficiency (supports CPU/GPU edge deployment)
- Context retention via minimal token bloat
- Alignment with enterprise KPIs and schema constraints
The use of PT-SLMs ensures that each subtask in a document pipeline is executed by the most appropriate cognitive unit, minimizing ambiguity and token noise.
3.2 GSCP-Orchestrated Cascade
GSCP (Gödel’s Scaffolded Cognitive Prompting) is the orchestration logic that ensures each PT-SLM receives structured prompts, role definitions, and context memory aligned with the system’s overall cognitive plan. Unlike conventional prompt engineering, GSCP introduces.
- Stage transitions and context checkpoints
- Confidence routing between models
- Symbolic memory linkage between responses
- Task-specific error recovery and fallbacks
Each model in the cascade operates under a scoped prompt stage (e.g., “You are an income statement parser”), but GSCP manages global reasoning state across all stages. This modularity introduces traceability, enabling human supervision and debugging a critical need for regulated industries.
4. Experimental Setup
We sought to validate this architecture in a high-stakes use case where hallucination and latency are unacceptable. Financial institutions handle volumes of legacy documents containing critical KPIs, ratios, and legal language. Parsing these efficiently and accurately has historically required either human effort or expensive LLM API calls.
4.1 Dataset and Task
We utilized a proprietary dataset of 7,000 banking documents, including loan agreements, risk disclosures, and income statements. The goal was to extract structured financial fields into a standardized schema (e.g., JSON or SQL) suitable for business intelligence and auditing workflows.
The task involved parsing nested clauses, interpreting visual layout cues, resolving synonyms across time frames (e.g., “2022 operating margin”), and preserving traceable metadata for each output.
4.2 Baselines
To evaluate effectiveness, we compared our architecture against several state-of-the-art configurations.
- GPT-4 with prompt scaffolding via system messages
- Fine-tuned LLaMA2-13B model using PEFT (parameter-efficient fine-tuning)
- A single GPT2-medium model fine-tuned on all stages as a monolith
- The proposed PT-SLM + GSCP cascade model
4.3 Metrics
Our evaluation was based on.
- Extraction accuracy vs. manually labeled targets
- Latency from input to final output
- Hallucination rate, measured as unsupported or unverifiable outputs
- Token consumption and cost, especially for hosted APIs
- Interpretability (measured via a human auditor score on transparency and traceability)
5. Results and Discussion
The results confirm that chaining PT-SLMs with GSCP orchestration outperforms traditional single-model and even GPT-4-based solutions in both quality and cost-efficiency.
![GSCP Orchestration]()
![Accuracy]()
The most notable gain was in hallucination control, with the GSCP-powered cascade reducing fabrication by up to 75% relative to GPT-4. Additionally, by isolating tasks such as segmentation and normalization into individual models, we achieved low memory overhead and near-real-time throughput. Human auditors preferred the modular system for its traceable logic and schema fidelity.
6. Cognitive Interpretation: Why It Works
The strength of the system lies not only in its components but in how they reason together. GSCP’s role-based prompting, task delegation, and checkpointing protocols emulate cognitive scaffolding found in human reasoning workflows. Instead of relying on brute-force generalization, this cascade embeds intentional cognition across stages.
From an architectural lens, the system benefits from.
- Reduced cognitive interference: each PT-SLM focuses on one skill
- Hierarchical memory layers: outputs are not discarded, but evolve
- Error resilience: lower-stage mistakes can be corrected mid-pipeline
- Semantic redundancy: similar data is cross-validated at multiple points
This design proves especially effective in environments where interpretability, compliance, and factual alignment are paramount.
7. Future Work
Our findings open new avenues for scalable cognitive pipelines. We propose several extensions.
- Integrating multimodal SLMs for visual-text hybrid documents (e.g., scanned statements)
- Adapting GSCP to dynamically re-route based on output confidence thresholds
- Embedding symbolic reasoning modules (e.g., OpenCog, miniKanren) for logic validation
- Creating multi-lingual PT-SLM sets for international compliance parsing
- Open-sourcing GSCP plugins for orchestration in LangChain or CrewAI ecosystems
Additionally, we intend to expand this work into regulatory filings, healthcare records, and contract review domains with high ambiguity and semantic importance.
8. Conclusion
This work demonstrates that chaining Private Tailored Small Language Models within a GSCP-guided architecture is not only viable, but preferable in many high-risk document processing domains. The benefits include reduced hallucination, lower cost, faster inference, and increased control, providing enterprises with a modular, interpretable alternative to all-purpose LLMs.
By aligning computation with cognition, and structure with supervision, we unlock a new tier of trustworthy NLP systems for the AI-powered enterprise era.
Acknowledgements
The author thanks the AlpineGate AI team and the anonymous financial institution for infrastructure, anonymized data, and review support. This work was conducted under corporate AI ethics and privacy guidelines.
References
- Godel, J. (2025). GSCP: Gödel’s Scaffolded Cognitive Prompting Framework
- Hu, E. et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models
- OpenAI. (2023). GPT-4 Technical Report
- Raffel, C. et al. (2020). Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Stanford CRFM. (2023). Alpaca: Instruction-Tuned Model from LLaMA