AI Systems Instructor • Real Estate Technologist
Quick Answer: Identify the parts of your prompts that stay the same across requests (Context Cards, system prompts, role definitions), structure your API calls so these parts are cached, and let the API reuse them instead of reprocessing. Anthropic's prompt caching can reduce costs by up to 90% on cached tokens.
If you're using AI APIs for real estate automation, you're probably paying full price on every request. Prompt caching changes that. When you send the same system prompt (your Context Card) with different user messages, caching lets the API reuse the processed system prompt instead of reprocessing it every time. The result: up to 90% cost reduction on the cached portion and faster response times. This guide shows you how to identify cacheable prompts, implement caching, and measure your savings.
Tools Needed
Anthropic API account or OpenAI API account, Python 3.8+, code editor
Every time you send an API request, the model processes every token in your prompt. Your 500-token Context Card gets processed from scratch whether it's your first request or your thousandth. Prompt caching tells the API: 'I've sent this system prompt before. Reuse the processed version.' The API stores the processed prompt and skips reprocessing on subsequent calls. You pay a small caching fee on the first request, then dramatically reduced costs on every cached hit after that. For real estate agents running batch operations (50 listing descriptions with the same system prompt), this adds up fast.
Tip: Think of prompt caching like a GPS that remembers your home address. Instead of re-calculating the route from scratch every time, it starts from a saved point. Your Context Card is the saved point. Each new task just adds the destination.
Cacheable content is anything that stays the same across multiple requests. In real estate AI workflows, this includes: your Context Card (brand voice, market context, style preferences), your role definition ('You are a luxury real estate copywriter in Nashville'), output format instructions ('Use short sentences, under 150 words, lead with the strongest feature'), and reference examples (few-shot examples of your best work). The non-cacheable part is the specific task data: property details, lead information, market stats that change. Separate these clearly in your prompts.
Tip: Your cacheable content should be in the system prompt. Your variable content should be in the user message. This clean separation maximizes cache hit rates. Most real estate workflows have 60-80% cacheable content.
Anthropic's API supports explicit cache control. In your system prompt, add cache_control markers that tell the API which content to cache. The system prompt (your Context Card) is the primary cache target. Keep it identical across requests—even small changes invalidate the cache. For batch operations like processing 50 listing descriptions, your system prompt stays the same for all 50. Only the user message (property details) changes. This means 49 out of 50 requests hit the cache and cost a fraction of the first request.
Tip: Cache breakpoints work best at the end of your system prompt. Put your Context Card and all static instructions in the system message with a cache marker at the end. Everything in the user message remains dynamic and uncached.
For Anthropic's API, add cache_control to your system prompt block. The implementation is straightforward: structure your messages so the system prompt includes your full Context Card with a cache_control breakpoint, and the user message contains only the variable task. On the first request, you pay a slight premium (25% extra on cached tokens). On every subsequent request within the cache window (typically 5 minutes), cached tokens cost 90% less. For a 50-item batch operation, that's 49 requests at 90% discount.
Tip: Anthropic's cache window is currently 5 minutes. Process your batch within that window to maximize hits. For 50 items with a 1-second delay between calls, the batch completes in under 2 minutes—well within the cache window.
Track three metrics: cache hit rate (percentage of requests that use cached prompts), cost per request (before and after caching), and total monthly spend. Anthropic's API returns cache hit information in every response. Build a simple logging function that records whether each request was a cache hit and the token costs. After one month, compare your cached costs to what you would have paid without caching. Most real estate API workflows see 40-70% total cost reduction with proper caching. Apply the OODA Loop: observe your cache hit rates, identify requests that miss the cache (usually from inconsistent system prompts), fix the inconsistency, and measure again.
Tip: If your cache hit rate is below 80%, you likely have inconsistencies in your system prompts. Even an extra space or changed word invalidates the cache. Use a single source of truth for each Context Card and load it programmatically rather than copying and pasting.
# Python example - Anthropic API with Prompt Caching
import os
import anthropic
client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
# This system prompt gets cached after first request
system_prompt = """
You are a luxury real estate copywriter specializing in Nashville, TN.
Market context: Median $445K, 2.8 months inventory, 24 avg DOM.
Style guidelines:
- Short, punchy sentences
- Lead with strongest visual feature
- Under 150 words
- End with location context
- Never use: stunning, nestled, welcome to, boasts
Examples of my voice:
[3 example listing descriptions would go here]
"""
# Process a batch of listings with the same cached system prompt
properties = [
"4BR/3BA colonial, Franklin, 3200 sqft, pool, renovated kitchen",
"2BR/2BA condo, Germantown, 1100 sqft, rooftop access, modern finishes",
"3BR/2BA ranch, Donelson, 1800 sqft, hardwood floors, fenced yard",
]
for prop in properties:
message = client.messages.create(
model="claude-sonnet-4-5-20250929",
max_tokens=300,
system=[
{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": f"Write listing description: {prop}"}]
)
# Check cache usage
print(f"Cache read tokens: {message.usage.cache_read_input_tokens}")
print(f"Cache creation tokens: {message.usage.cache_creation_input_tokens}")
print(message.content[0].text)
print("---")
BATCH PROCESSING WITH CACHING — COST COMPARISON Request 1 (cache creation): - System prompt: 450 tokens (cache write: 450 x $3.75/M = $0.00169) - User message: 25 tokens (standard: 25 x $3.00/M = $0.000075) - Output: 180 tokens ($15.00/M = $0.0027) - Total: $0.00446 Request 2 (cache hit): - System prompt: 450 tokens (cache read: 450 x $0.30/M = $0.000135) - User message: 22 tokens (standard: $0.000066) - Output: 165 tokens ($0.002475) - Total: $0.00268 Request 3 (cache hit): - System prompt: 450 tokens (cache read: $0.000135) - User message: 28 tokens (standard: $0.000084) - Output: 172 tokens ($0.00258) - Total: $0.00280 BATCH TOTAL (3 listings): $0.00994 WITHOUT CACHING (3 listings): $0.01278 SAVINGS: 22% on 3 items At 50 items: ~$0.14 with caching vs. ~$0.21 without = 33% savings At scale with larger system prompts (2000+ tokens): savings reach 60-70% Cache hit rate: 66.7% (2 of 3 requests) System prompt processing: 1 time instead of 3
Keep your Context Cards identical across all requests in a batch. Load them from a file, not from memory. Even a typo fix between requests creates a new cache entry instead of hitting the existing one.
Larger system prompts benefit more from caching. A 2,000-token Context Card with few-shot examples saves 10x more per cached request than a 200-token role prompt. Put your most detailed, stable content in the cached system prompt.
Time your batch operations to complete within the cache window (5 minutes for Anthropic). If your batch takes longer, the cache may expire and costs revert to full price for later items.
Use prompt caching alongside model selection for maximum cost optimization. Cache your Context Card with Claude Haiku for routine tasks. The combination of caching + cheapest model produces AI output at pennies per request.
Modifying the system prompt slightly between requests, invalidating the cache
Fix: Load your system prompt from a single source file. Any change—even adding a period or fixing a typo—creates a new cache entry. Treat the cached prompt as immutable during a batch operation.
Putting variable content (property details, lead names) in the system prompt instead of the user message
Fix: Separate static from dynamic content. System prompt = Context Card (static). User message = task and data (dynamic). Variable content in the system prompt guarantees cache misses on every request.
Not tracking cache performance, assuming it's working without verification
Fix: Log cache_read_input_tokens and cache_creation_input_tokens from every response. If cache_read is always 0, your caching isn't working. Debug by checking system prompt consistency across requests.
Learn the Frameworks
Full technical explanation of how prompt caching works across different AI APIs.
Complete guide to Claude's API with caching implementation details.
OpenAI's automatic prompt caching and how it compares to Anthropic's approach.
Stop guessing with AI. Join The Architect workshop to master the frameworks behind every guide on this site.