AI feels like magic until you get your first bill.
When teams discuss whether to rent a general-purpose LLM (like GPT, Gemini, or Claude) or build their own smaller domain-specific model, the conversation often gets stuck on price tags and technical complexity. But there’s another critical detail that many articles gloss over: general LLMs don’t magically know your company’s data . If you want them to answer real product or order questions, you have to wire them into your systems.
This blog takes a clear look at both paths, using the same example of a retail chatbot answering "Where’s my order?" —to highlight the tradeoffs.
Option A: Renting General-Purpose LLMs
At first glance, this feels like the easy button. You call GPT or Gemini’s API, pass in a customer question, and get a natural-language answer. But here’s the reality:
They don’t know your data out of the box
GPT has no access to your product catalog, your order database, or your policies.
If a customer asks "Where’s my order?" and you just pass that raw text to GPT, it will respond generically:
"You can usually track your order on the company’s website."
Clearly, that’s not useful.
How companies make it work
To bridge the gap, teams layer in one (or both) of these approaches:
1. RAG (Retrieval-Augmented Generation)
👉 GPT didn’t "know" your data. You injected it just-in-time.
2. Fine-tuning / Custom Training
You can fine-tune GPT on your company’s FAQs, chat transcripts, and policies.
This ensures consistent tone and brand voice.
But: fine-tuning still doesn’t give live access to customer data—you still need APIs or RAG for dynamic info.
Let’s do the math
Say your chatbot processes 2 million tokens per day (1.2M input, 0.8M output).
Input: 1.2M × $75 / 1M = $90/day
Output: 0.8M × $150 / 1M = $120/day
Total = $210/day ≈ $6,300/month
Benefits
Option B: Building Your Own Domain Model
This is the opposite extreme: you train a small foundation model (say 7B parameters) on your own data + domain knowledge.
Why it’s attractive
You own the weights → no per-call API fees.
You can bake in domain knowledge deeply.
Potentially cheaper long-term if usage is massive.
What it takes
1. Data preparation
Collecting, cleaning, and labeling product info, chat history, and policies.
Cost can hit hundreds of thousands if the annotation is manual.
2. Training infra
3. Inference Infrastructure
Once trained, you still need GPU servers to host it.
Each customer query requires an inference, which adds to your power consumption and can increase latency.
4. Maintenance
Benefits
Costs
Initial build: high (millions).
Ongoing hosting: significant.
Only makes ROI sense at a very high scale.
The Key Takeaway
If you need a chatbot to answer "Where’s my order?" , GPT won’t magically know. You either:
That’s why many companies start with Option A (renting) , it’s pragmatic and fast. But if your volumes explode, costs spiral, or compliance requires self-hosting, Option B becomes worth considering.
Final Word
The debate isn’t really LLM vs. custom model . It’s about how you balance cost, control, and time-to-market . Smart teams often start with renting, layer in RAG/fine-tuning, and only move to building their own once the business case is undeniable.
✍️ That’s my breakdown. Curious, if you were building that retail chatbot, would you rent GPT forever or take the plunge on your own model?
For business leaders: Use this article to spark a conversation about your long-term AI strategy. Don't just look at the API price; consider the total cost of ownership.
For developers: Before you start coding, map out the data and API calls needed to truly make a rented LLM useful. This will help you make a better case for your team's strategy.