How to answer real-estate leads in five minutes (Twilio + foundation model + iPhone Shortcuts)
Industry replies in 42 hours. The math says you need 5 minutes.
That's not a follow-up problem. It's a category mistake. Most agents treat lead response like email — get to it when you can. The Lead Response Management Study (Oldroyd, MIT/InsideSales — 15,000 leads, 100,000 call attempts — full PDF) measured what actually happens when you wait. Reply in 5 minutes versus 30 and you're 100x more likely to make contact and 21x more likely to qualify the lead. The industry average response time is 42 hours.
Forty-two hours. On a 21x clock.
The bleed nobody costs out
A REALTOR at Compass Hendersonville buys Zillow Premier Agent leads in 37075. The form fires at 11:47 PM Tuesday. She's at a showing the next afternoon. Soccer practice after that. The lead sits in her inbox until Wednesday at 9:14 AM. That's 33 hours. The lead already filled out three other agent forms. They toured a house with the second one Tuesday morning. She sends a polite text. Gets ghosted. Calls it a bad lead.
It wasn't a bad lead. It was a 33-hour response on a 21x clock. She walked past the deal before she opened the email.
This is where automation actually pays — but only past gate 3 from the pillar's three-gate framework. Below gate 3, paid leads are under 30% of pipeline. Your sphere prompt does more work than any first-touch automation. Past gate 3, manual response can't hit the 5-min window. Not because you're lazy. Because you're at a showing.
I'll show you the DIY first-touch stack — Twilio plus a foundation model plus iPhone Shortcuts. 60 minutes of setup. $30/mo all-in. Hits the 5-min window 24/7.
What the stack costs vs the vendor route
The vendor answer here is Structurely. They'll text and call the lead inside the 5-min window. $179/mo Starter, $299 Growth, $499 Build, up to $1,499/mo top-tier per Unify Real Estate 2026. It works. It also runs on their model, their voice, their script tree. You don't see the prompt. You can't tune it. You can't make it sound like you.
The DIY stack: Twilio plus a foundation model API plus an iPhone Shortcut. Twilio per 2026 pricing — ~$1.15/mo per number, ~$0.0083 per SMS. Call it $5–$10/mo at typical volumes. Claude Sonnet API runs ~$3 per million input tokens, $15 per million output. GPT-4 class is similar. A first-touch reply is ~2,000 tokens. Pennies. Call the API line $15–$20/mo at heavy volume. iPhone Shortcuts is free.
Total: ~$30/mo all-in versus $179–$1,499/mo. And the prompt is yours.
The honest vendor versus DIY comparison lives on its own page.
The four-step build
Sixty minutes of setup. One Twilio account. One LLM API key. One iPhone. One Context Card with your voice. Here's the build.
Step 1 — Set up the Twilio number to receive lead webhooks
Buy a Twilio phone number (~$1.15/mo). Wire it to receive webhooks from your paid lead sources — Zillow Flex, Realtor.com, Compass One. If your CRM doesn't send webhooks the way you need, drop a Make.com scenario in front (Core plan $9/mo per 2026 pricing). Make.com watches your CRM and posts to Twilio with the lead's name, phone, source, and inquiry text.
The webhook fires the moment the lead lands. Not when you check the inbox. Not when the kid's in the car. The moment.
Setup time: 15 minutes.
Step 2 — Wire the foundation model to draft the first-touch reply
The webhook hits a serverless function (Twilio Functions, Vercel, or a Make.com module — pick what you have). The function loads your Context Card plus the lead payload into a foundation model API.
The Context Card is the load-bearing piece. Same four-layer card from the LLM agent framework — role and persona, voice and tone, do-not-say list, local knowledge. For first-touch you load: your name, brokerage, phone, market specifics (Hendersonville, Old Hickory Lake, 37075 schools, recent comps), and voice rules — short sentences, contractions, no "dream home" copy.
The model gets the lead payload plus your Context Card. It drafts a reply that reads like you wrote it on your couch. Two sentences. Acknowledge the inquiry. Ask one qualifying question. Sign with your name.
This is the pattern Yan and Husain document in What We Learned from a Year of Building with LLMs (O'Reilly, May 2024) — small, deterministic tasks with clear objectives are where foundation models shine. First-touch is exactly that shape. The model's not negotiating contracts. It's writing two sentences off a structured payload.
Setup time: 25 minutes. Most of that is writing the Context Card once.
Step 3 — Send the draft via Twilio SMS within 60 seconds
The draft comes back from the model. The serverless function passes it to Twilio's SMS API and fires it to the lead's phone. Webhook to text in under 60 seconds — most of that is model latency at 2–4 seconds.
You're inside the 5-min window before the lead's closed the browser tab.
The text comes from your Twilio number, so the lead sees a Tennessee area code, not a generic 800. It reads like you. It asks a real qualifying question — price range, timeline, financed or cash, "toured anything in 37075 yet." It doesn't sound like a chatbot.
Setup time: 5 minutes.
Step 4 — Send a copy to your phone via iPhone Shortcuts to verify and follow up
This is the human-in-the-loop step. Simon Willison wrote on his weblog in 2025 that the expensive part of LLM workflows isn't the model. It's the verification. He's right. The model drafts well 95% of the time. The other 5% is your reputation.
So: when Twilio sends the SMS to the lead, it also fires a webhook to an iPhone Shortcut you've configured. The Shortcut pings your phone with the lead's name, the inquiry, and the exact text that went out. You see what shipped. You're verifying after the fact, not before. The lead already got the reply inside the 5-min window. You're queued up to follow with the call.
The model drafts. You read. The model already sent. You follow with the human call.
If the draft was wrong — voice off, fact off, awkward question — correct it in the next text from your phone. You don't lose the 5-min window debugging. You catch the miss on the call.
Setup time: 15 minutes. One Shortcut, one notification action.
The result
Webhook to first text: under 60 seconds, 24/7. You're inside the 5-min window every time, including 11:47 PM Tuesday. Cost: ~$30/mo all-in — Twilio plus the API line plus zero for the iPhone you already carry. The prompt is yours, the voice is yours, the Context Card is yours.
Compare:
- Structurely: $179/mo Starter, $499 Build, up to $1,499/mo top-tier per Unify Real Estate 2026. Vendor-controlled prompt, vendor-controlled script tree, vendor-controlled guardrails. Works. Doesn't tune.
- Manual response: 42-hour industry average. 95% of paid lead spend lit on fire.
- DIY stack: $30/mo, your voice, sub-60-second response, verify-then-follow-up loop.
Run the DIY stack three months and the marginal cost is rounding error against one closed deal in 37075.
Why this works
Two reasons, both from primary sources outside vendor marketing.
One — the task is the right shape. Yan and Husain document the pattern in What We Learned Part I and reinforce it in the Part III strategy piece: foundation models are reliable on small tasks with clear objectives. First-touch lead intake is the cleanest small task in the real-estate workflow. Structured input. Structured output. Bounded scope. The model isn't deciding which house to show. It's writing two sentences in your voice off a deterministic schema. That's where models earn their keep.
Two — verification is the expensive part, and the loop respects that. Simon Willison's 2025 weblog work names it: the model writes fast, you verify slowly, and the verification is the cost. The DIY stack pushes verification after the send so you don't lose the 5-min window. Then you follow on the call inside the same window — that's where the human work moves the deal. AI does 80% (the draft, the speed). Human does 20% (the call, the close).
That's why this works and why off-the-shelf vendors don't match it. They optimize one workflow for everyone. You optimize for your voice, your market, your Context Card, your phone.
When this DOESN'T work
This is the expensive page on the cluster, and it's wrong for most readers.
Below gate 3 — paid leads under 30% of pipeline — this is over-engineered. If you're a sphere-driven 12-deal Hendersonville agent and most of your closings come from referral (NAR 2025 says 66% on average), you don't have a 5-min response problem. You have a sphere-prioritization problem. Run the Tuesday morning prompt. $20/mo, 4 minutes, works against the people already in your phone. First-touch automation on a $200/mo Zillow trickle is over-engineering.
Above gate 3 with very high lead volume, hire an actual ISA. A salaried inside sales agent qualifies warmer than any automation. They hear hesitation. They ask the second question the script missed. They build rapport in 90 seconds. The DIY stack handles "we got your inquiry, here's a real text in 60 seconds." It doesn't handle "qualify a 7-figure buyer relocating from Chicago." If your paid lead volume justifies a $48–$60K/yr ISA, hire the ISA. The automation runs the night shift. The ISA runs the day.
The DIY stack is for the middle band — past gate 3, paid leads 30–60% of pipeline, no salaried ISA yet, agent at showings during business hours. 60 minutes of setup pays for itself in one closing.
The Context Card is the load-bearing piece
If the four steps feel like infrastructure and the reply text feels like an afterthought, you've got it backwards. Twilio is plumbing. The model is plumbing. The Shortcut is plumbing. The Context Card is the only piece that makes the reply sound like you instead of like a vendor.
The card is four layers — role and persona, voice and tone, do-not-say list, local knowledge. It's what separates "I'd love to help you find your dream home" from "Saw you pulled the listing on Old Hickory — is timing flexible or are you tied to a school year." First one closes nothing. Second one is what an actual Compass agent would text.
For first-touch, your card needs: name and brokerage, three or four sub-markets with current comp ranges, your real first-question patterns, your do-not-say list. Build it once. Reuse it on the Tuesday morning sphere prompt, the listing description draft, every prompt you'll run. That's the lever.
The clean version
Past gate 3. 60 minutes of setup. $30/mo all-in. Webhook to text in under 60 seconds, 24/7. Your voice, your Context Card, your phone in the loop. The model handles the small task. You handle the call.
Below the gates? Run the Tuesday morning prompt. Considering the vendor option? The comparison page does the math.
Cleared the gates, shipped the DIY stack, ready to operationalize the rest of the AI layer? That's what The Listing Machine is built for. Four-week beta cohort, AI-Enhanced Realtor credential, prompt stack tuned to your voice. We work against your real listings — Old Hickory Lake, Cool Springs — not a hypothetical.
Up to the pillar: the three gates decide whether this stack is the right shape at all.
Sources
Primary data:
Independent creators:
Practitioner-aggregator vendor reviews:
Internal: AI Acceleration Section 5 Context Card framework, Make.com pricing 2026, Anthropic Claude API pricing 2026, OpenAI API pricing 2026.
Last updated 2026-04-29.
JSON-LD schema