AI Prompting Techniques 7 min read

Prompt Caching Explained: The AI Cost Cut Most Agents Miss

RW
Ryan Wanner

AI Systems Instructor • Real Estate Technologist

Every time you send a prompt to ChatGPT or Claude, you pay for every word. Including the 500 words of instructions you send identically every time. Prompt caching fixes that.

You Are Paying for the Same Instructions Every Single Time

Think of it like calling a restaurant and re-explaining your dietary restrictions before every order. "I'm allergic to shellfish, I don't eat pork, I prefer grilled over fried, and I like extra vegetables." Same speech. Every call. The restaurant charges you by the minute.

Prompt caching is saving your preferences on file. The restaurant already knows. You just order the dish.

In technical terms: when you use an AI API, you send a system prompt (your instructions and context) plus the user message (the specific task). Most of the cost comes from the system prompt because it's long and it's the same every time. Prompt caching stores that system prompt so you only pay for it once. Every subsequent call only charges for the new user message.

Anthropic's prompt caching documentation puts the numbers directly: up to 90% cost reduction and up to 85% latency reduction. The break-even point is just two cache hits — meaning if you send the same base prompt more than twice, caching saves money.

For real estate teams running dozens or hundreds of AI queries per day with consistent instructions, this is not a minor optimization. It's the difference between a manageable AI budget and one that grows faster than the value it creates.

How Prompt Caching Works

The Technical Version

When you send an API request, the system prompt gets tokenized and processed before the model generates a response. Prompt caching stores the processed system prompt in memory so subsequent requests skip the tokenization and processing step. The cached prefix is reused, and you only pay the reduced cost for a cache read instead of a full processing pass.

The economics are straightforward. Anthropic charges approximately 90% less for cached prompt tokens compared to new tokens. OpenAI's Assistants API achieves similar savings through persistent threads. Google Gemini offers context caching that stores conversation context server-side.

Break-even: two cache hits. If your system prompt is 2,000 tokens and you send 3 requests, you've already saved money.

The Plain English Version

Think of it like a textbook on the teacher's desk versus printing a new copy for every student. Same information. Fraction of the cost.

Your system prompt is the textbook — the instructions that tell the AI who it is, what it knows, and how to respond. Without caching, every single API call reprints the textbook from scratch. With caching, the textbook stays on the desk. Each student (each new query) just walks up and asks their question.

Which platforms support it: Anthropic Claude (native prompt caching), OpenAI (Assistants API with persistent threads), and Google Gemini (context caching). Each implements it differently, but the principle is the same.

When Real Estate Teams Need Prompt Caching

Not every agent needs prompt caching. If you're using ChatGPT's web interface and typing prompts manually, caching doesn't apply to you — OpenAI handles that on their end.

Prompt caching matters when you're:

Running 50+ AI queries per day with the same base instructions. Teams that built custom tools for listing descriptions, lead responses, or market reports are sending the same system prompt hundreds of times. Each one costs money without caching.

Using AI for consistent output across a team. If your brokerage built an AI tool that ensures every agent's listing descriptions match brand standards, the brand guidelines live in the system prompt. Cache it once, use it thousands of times.

Building custom AI tools via API. This is the key distinction. If you're a solo agent using ChatGPT's web app, you don't need to think about this. If your team or brokerage built tools that call the API directly — through custom scripts, Zapier integrations, or dedicated software — prompt caching is where the savings are.

Andrew Ng's DeepLearning.AI course on prompt engineering for developers covers this principle: systematic prompt engineering treats prompts as software components, not throwaway text. Caching is part of that engineering mindset.

Cost Comparison: With and Without Caching

ScenarioWithout CachingWith CachingMonthly Savings
Solo agent (20 queries/day)$45/mo$12/mo$33 (73%)
Team of 5 (100 queries/day)$225/mo$38/mo$187 (83%)
Brokerage API tool (500+ queries/day)$720/mo$72/mo$648 (90%)

Estimates based on typical system prompt sizes (1,500-2,500 tokens) and Claude API pricing. Actual savings depend on prompt length and query volume.

Before and After: Sarah's Brokerage in Denver

Sarah manages technology for an 80-agent brokerage in Denver. They built a custom AI listing description tool that every agent uses. The system prompt includes brand voice guidelines, formatting rules, compliance language, and style examples — about 2,200 tokens of instructions that go with every single request.

Before caching: $720 per month in API costs. Growing every month as more agents adopted the tool.

After implementing prompt caching on the system instructions: $72 per month. Same tool. Same output quality. Same number of agents using it.

The math: $648 per month saved × 12 months = $7,776 per year. Implementation time: about 2 hours of developer work to add cache headers to the API calls.

The ROI calculation barely needs a calculator. $7,776 saved for 2 hours of work is $3,888 per hour of implementation time. That's the kind of return that makes IT budgets look reasonable to brokerage owners who are skeptical about AI spending.

Common Mistakes

Mistake 1: Thinking you need prompt caching for ChatGPT web. If you're using ChatGPT, Claude, or Gemini through their web interfaces, you don't need to implement caching. The platforms handle optimization on their end. Caching is specifically for API users — teams and developers building custom tools.

Mistake 2: Caching dynamic content. Only cache the parts of your prompt that stay the same. Your system instructions, brand guidelines, and formatting rules are cacheable. The property details, client names, and market data that change with every request are not. Mixing dynamic content into cached prompts defeats the purpose.

Mistake 3: Not measuring your baseline costs first. You can't prove savings without a starting number. Before implementing caching, document your current monthly API spend, average queries per day, and average prompt size. Then implement caching and measure the delta. This is especially important for presenting the ROI to brokerage leadership.

Sources

  1. Anthropic — Prompt caching reduces costs up to 90%, latency up to 85%; break-even at 2 cache hits
  2. DeepLearning.AI — Prompt Engineering for Developers (Andrew Ng + OpenAI)
  3. arXiv — The Prompt Report: systematic survey of 58 LLM prompting techniques
  4. NAR — 60%+ of realtors now use AI tools in business operations

Frequently Asked Questions

What is prompt caching in AI?
Prompt caching stores the repeated parts of your AI prompts (system instructions, context, guidelines) so you don't pay to process them with every request. When you send the same base instructions with each API call, caching means the AI reuses the processed version instead of starting from scratch. The result: up to 90% cost reduction and up to 85% faster response times.
Does ChatGPT support prompt caching?
If you use ChatGPT through the web or mobile app, caching is handled behind the scenes and you don't need to do anything. If you use the OpenAI API directly, the Assistants API provides persistent threads that achieve similar caching effects. Custom GPTs also retain their instructions across conversations without re-sending them.
How much does prompt caching save?
Savings depend on your usage pattern. Anthropic documents up to 90% cost reduction for cached prompts. In practice, a solo agent running 20 queries per day might save $33 per month. A brokerage running 500+ queries per day through a custom tool can save $648 per month or more. The larger your system prompt and the more queries you run, the bigger the savings.
Do I need to be technical to use prompt caching?
Yes, prompt caching is a developer-level feature. It requires modifying API calls to include cache control headers or using platform-specific caching features. If you're using ChatGPT or Claude through their web interfaces, you don't need prompt caching at all. If your brokerage has a developer or uses custom AI tools built on APIs, they can implement it in about 2 hours.
Is prompt caching available for Claude and Gemini?
Yes. Anthropic Claude has native prompt caching with documented 90% cost reduction. Google Gemini offers context caching that stores conversation context server-side. OpenAI's Assistants API provides persistent threads. Each platform implements caching differently, but all three major providers support some form of it for API users.

Related Terms

Keep Reading

Related Articles

Stop Guessing With AI

Join The Architect workshop and learn the frameworks that turn AI from a toy into a business tool.