LLM Fundamentals
What is Prompt Caching?
Prompt caching is a technique where frequently used parts of AI prompts—like system instructions, business context, or reference documents—are stored and reused across multiple API requests, dramatically reducing costs (up to 90%) and response times for repetitive AI workflows.
Understanding Prompt Caching
Every time you send a message to an AI like ChatGPT or Claude through their API, you're sending the entire conversation context—including any system instructions, background information, and conversation history. For a real estate team running hundreds of AI requests per day, this means the same business context, brand guidelines, and market data get re-processed with every single call. Prompt caching solves this by storing the unchanging parts of your prompt so the AI can skip re-reading them, only processing the new portion of each request. Think of it like a title company that keeps your standard contract clauses on file rather than retyping them for every transaction.
The economics are significant. Major AI providers now offer caching that reduces token costs by 75-90% on the cached portion and cuts response latency by 50-80%. If you've built a real estate AI system that includes a 2,000-word Context Card (your market data, brand voice, client avatar, and business rules) in every API call, caching means you pay full price once and then a fraction for every subsequent call that reuses that context. For a team making 500 API calls per day, this can reduce monthly AI costs from thousands of dollars to hundreds.
This concept connects directly to AI Acceleration's Context Cards approach. When you build a comprehensive Context Card—including your business identity, market expertise, writing style, and client avatar—you're creating exactly the kind of reusable context that benefits enormously from caching. Instead of the AI re-reading your entire Context Card for every listing description, email draft, or social post, caching lets it remember your context and jump straight to the new task. The result is faster responses, lower costs, and more consistent output because the AI maintains a stable understanding of your business across all interactions.
While prompt caching is primarily a technical concept relevant to developers and API users, understanding it helps every agent make informed decisions about AI tools and costs. When evaluating AI platforms for your business, ask whether they use caching—it's a sign of sophisticated engineering that translates to better performance and lower costs for you. As more agents build custom AI workflows using platforms like Zapier, Make, or direct API access, understanding caching becomes a practical cost management skill. See our full guide with cost data.
Key Concepts
Cache Hit vs. Cache Miss
A cache hit occurs when the AI finds the stored context and reuses it (fast and cheap). A cache miss means the context must be fully reprocessed (normal speed and cost). Well-designed prompts maximize cache hits by keeping the unchanging parts at the beginning.
Time-to-Live (TTL)
Cached prompts expire after a set period—typically 5-60 minutes depending on the provider. Frequent usage keeps the cache warm (active), while infrequent usage lets it expire, requiring a fresh full-price call to re-establish the cache.
Minimum Cache Length
Providers require a minimum amount of text before caching activates—typically 1,024 to 2,048 tokens. This means caching is most valuable for substantial context blocks like system instructions, business profiles, or reference documents, not short simple prompts.
Prompt Structure Optimization
For caching to work, the cacheable content must be identical across requests. This means structuring prompts so static content (business context, instructions) comes first, and variable content (the specific task) comes last.
Prompt Caching for Real Estate
Here's how real estate professionals apply Prompt Caching in practice:
Bulk Listing Description Generation
Cache your brand voice, writing guidelines, and MLS formatting requirements while generating descriptions for multiple listings.
You have 15 new listings to write descriptions for this week. Your cached context includes your brand voice guide, preferred description structure, MLS character limits, and fair housing language guidelines (2,500 tokens). Each individual listing request only sends the unique property details (200-400 tokens). Instead of sending 2,900 tokens x 15 = 43,500 tokens at full price, you send 2,500 once + 300 average x 15 = 7,000 tokens at full price, with the rest cached at 90% discount.
Automated Email Sequences
Cache your communication style, business context, and follow-up frameworks while generating personalized emails for each lead.
Your drip campaign AI generates personalized follow-up emails for 50 new leads per week. The cached context includes your writing style, business value proposition, current inventory highlights, and email formatting rules. Only each lead's specific details and stage in the funnel vary per request. Caching cuts your AI email generation costs by 80% while maintaining perfectly consistent brand voice.
Market Report Automation
Cache your market report template, analysis framework, and commentary style while updating with fresh data each period.
Your monthly market report system caches the report structure, analysis methodology, your commentary style, and historical trend context. Each month, only the new data points change. The AI produces consistent, branded reports in seconds instead of minutes, at a fraction of the cost—enabling you to produce reports for 10 neighborhoods instead of just one.
Client Communication Assistant
Cache your full business context so your AI assistant can quickly handle any client communication task throughout the day.
Your Context Card—including your bio, current listings, market expertise, communication style, and standard disclaimers—stays cached throughout your workday. When you ask the AI to draft a buyer follow-up, respond to a listing inquiry, or write a negotiation email, it already knows everything about your business. Responses come 3x faster and cost 85% less than without caching.
When to Use Prompt Caching (and When Not To)
Use Prompt Caching For:
- You make multiple AI API calls per day that share the same context, instructions, or reference materials
- Your AI workflows include large context blocks like Context Cards, brand guidelines, or MLS rules
- You're generating bulk content (multiple listings, emails, or social posts) with consistent style requirements
- Cost optimization is important because your AI usage has grown to significant monthly spending
Skip Prompt Caching For:
- You use AI casually through web chat interfaces—caching is handled automatically by the platform
- Each AI request is completely unique with no shared context across calls
- Your usage volume is low enough that caching savings wouldn't meaningfully impact your costs
- You're just starting with AI and should focus on learning effective prompting before optimizing costs
Frequently Asked Questions
What is prompt caching?
Prompt caching is a feature offered by AI API providers (like Anthropic's Claude and OpenAI's GPT) that stores frequently used parts of your prompts so they don't need to be fully reprocessed with every request. Think of it like saving a template: the AI remembers your business context, instructions, and reference materials so each new request only needs to process the unique part. This reduces costs by 75-90% on cached content and makes responses significantly faster.
Do I need to worry about prompt caching as a regular agent?
If you only use AI through web chat interfaces (like chatgpt.com or claude.ai), these platforms handle caching behind the scenes—you benefit without doing anything. Prompt caching becomes directly relevant when you build custom AI workflows using APIs, Zapier, Make, or similar tools, or when you're evaluating AI platforms for your team. Understanding caching helps you ask the right questions: 'Does your platform use caching? What are the actual per-query costs?' This knowledge helps you avoid overpaying for AI tools.
How much money can prompt caching save?
Savings depend on your usage pattern, but they can be dramatic. A real estate team making 500 AI API calls per day, each including a 2,000-token Context Card, might spend $1,500/month without caching. With caching, the same usage drops to $300-400/month—a 70-80% reduction. Individual agents with lighter usage might save $50-100/month. The savings scale with volume: the more repetitive your AI usage pattern, the more caching helps.
How does prompt caching relate to Context Cards?
Context Cards and prompt caching are a natural pairing. A Context Card is the comprehensive business context you provide to AI—your expertise, style, market knowledge, and client information. This is exactly the type of large, stable content that caching is designed for. When you build a thorough Context Card and use it across all your AI interactions, caching ensures you're not paying to re-send that same context every time. The result: better AI outputs (consistent context) at lower cost (cached pricing).
Sources & Further Reading
Master These Concepts
Learn Prompt Caching and other essential AI techniques in our workshop. Get hands-on practice applying AI to your real estate business.
View Programs