AI lead scoring for real estate — owned data wins
The bleed isn't lead volume. It's prioritization on the data you already paid for.
The actual problem
A working REALTOR at Compass Hendersonville has 487 contacts in Follow Up Boss. She'll make five calls today between showings. The CRM sorts alphabetically. The dashboard sorts by last login. Neither tells her who's warm. She calls who she remembers, misses who she should've called, and burns the slot on the wrong human.
Per NAR's 2025 Member Profile, the median REALTOR closed 10 sides at $58,100 gross last year. 66% of sellers found their agent via referral or prior transaction. 20% of a typical Realtor's income comes from past clients. The signal is in the sphere — but only if you rank it.
Layer on the Lead Response Management Study (Oldroyd, MIT/InsideSales — 15,000 leads, six companies): replying inside 5 minutes makes a rep 100x more likely to make contact and 21x more likely to qualify versus 30 minutes. Industry average is 42 hours. The same logic governs sphere calls — right person at the right window beats a hundred wrong people.
Salesforce's 2024 State of Sales report puts the labor side in numbers: reps spend 28% of the workweek selling. The 12-deal agent doesn't have a lead problem. She has a prioritization problem dressed up as a lead problem.
Two definitions of "AI lead scoring"
The phrase points at two different products. Different money, different problems.
The vendor version. Predictive seller scoring. SmartZip, Offrs, Catalyze AI, Likely.ai. The vendor builds a model on tax records, deed transfers, life-event signals, MLS history. Outputs a score for every household in your ZIP. Pitches that score as a list of likely sellers in the next 6–18 months. Pricing verified against vendor sites and 2026 reviews:
- SmartZip: $500–$1,200/mo, 12-month contract, $2,000 early-cancel fee in some accounts (HousingWire, The Close, Hooquest).
- Offrs: $300/mo shared ZIP (same list goes to 3+ other agents), $600/mo exclusive. 2.4 stars on Sitejabber across 160 reviews.
- Catalyze AI: ~$360/mo for ~30 inherited-property leads in a 50-mile radius, vendor-claimed 4-of-10 precision.
- Likely.ai: doesn't publish pricing. Gates behind a sales call. That's a tell.
Vendor-stated accuracy lands around 70–72% for SmartZip and Offrs. No independent academic or NAR-funded study verifies any of those numbers. Every cited figure traces to vendor marketing or affiliated reviewers.
The owned-data version. A foundation model — ChatGPT Plus, Claude Pro, Gemini Advanced — reading your sphere CSV. Same exercise the vendor performs, except the model scores the people you already know against the signal that's already in your notes. $20/mo, no contract, cancel anytime.
That's what predictive seller scoring looks like when it's run on owned data. Every agent doing 5–25 deals a year already has the inputs. Most don't run the workflow because nobody's told them it's a four-minute Tuesday job.
The vendor version's three failure modes
Three structural problems break the vendor version. The pillar walks the full math.
Non-exclusive lists. The Offrs $300 shared plan emails the same list to three other agents in your ZIP. First-call wins, and the agent with a salaried inside-sales rep on the phone in 5 minutes wins more often than you do. Practitioner reviews on Sitejabber and Hooquest document this pattern repeatedly — "the same prospect already has three of us calling them."
No labor to action the list. Per the LRM Study, the 21x qualification jump only works if a body picks up the phone inside 5 minutes. Sole proprietor with a showing on the calendar and a kid in soccer practice fails this gate by definition. The predictive tool tells you who. Someone still has to call.
No farm concentration. Predictive farming only ROIs when you can door-knock, direct-mail, and door-hang the same households for 12+ months. If you closed 4 transactions in 37075, 3 in Brentwood, and 5 scattered across Williamson County last year, you don't have a farm — you have a service area, and the data turns over before you do.
Below all three gates, the vendor version is a 12-month subscription against a list that gets shared, doesn't get called, and runs in the wrong ZIPs. That's the pattern documented across BiggerPockets practitioner threads — "12-month auto-renewal, zero closings."
The owned-data version's unfair advantage
You have notes the vendor doesn't.
The vendor's score is built from public records — tax assessor data, deed transfers, permit pulls, MLS history, demographic overlays. Powerful inputs in aggregate, and structurally blind to the line in your CRM notes that says "Joel mentioned the kid is graduating from MTSU in May, downsize talk."
Karpathy framed the why on X in 2025 as Software 3.0: English is the new programming language, the model is the runtime, you don't structure the data first and query it after — you hand it the mess and ask the question. A vendor's model only sees what got typed into a structured field upstream. If the signal isn't in the schema, the dashboard can't see it.
A foundation model reading the actual prose of your notes is the opposite. You hand it 487 contacts of half-thoughts — "kid's graduating in May," "mentioned divorce on the phone in October," "moved her aunt to Westmoreland" — and it surfaces the weak signals the vendor structurally can't. That's the Karpathy weak-signal frame applied to your own data.
This isn't fringe. Yan, Husain et al. point to the same shape in What We Learned from a Year of Building with LLMs (O'Reilly, May 2024): foundation models earn their keep on small tasks with clear objectives — structured input, bounded scope, defined output. Ranking 487 contacts against five named criteria is exactly that shape. The LLM-on-CSV pattern is also documented as a working revops playbook in Census's roundup of LLM prompts for revops and marketing teams — the same pattern that ranks your sphere already runs pipeline scoring inside operating B2B teams.
The line: AI on your sphere CSV beats AI on a list of strangers. The CRM you already paid for has more signal than the predictive model that costs ten times as much.
The four-minute Tuesday morning workflow
Tuesday, 7:42 AM. Same Hendersonville agent. She exports her sphere as CSV — name, email, phone, last contact date, notes field, transaction history. Drops it into ChatGPT and pastes this:
Here's my sphere. Rank the top 10 to call this week.
Score on:
- last contact date
- life-stage triggers in notes (kids, divorce, retirement,
job change, moving company referral, college graduation)
- prior transaction with me
- referral source signal
Tell me what to say in the first 30 seconds for each.
Skip anyone I closed in the last 18 months unless they
referred someone since.
Output: ranked top 10, with the opener for each + the
why-now reason.
90 seconds later: ranked list, 30-second opener for each contact, why-now reason for each call. She calls #1 at 8:05 AM driving the kids to school.
Cost: $20/mo. Time invested: 4 minutes including the export. SmartZip alternative: $700/mo, 12-month contract, list of strangers who got the same list emailed to three other agents in 37075.
The full walkthrough — advanced second-pass prompt for weak signals, privacy toggle settings, the weekly routine — lives at /how-to/rank-your-sphere-with-chatgpt. Bookmark it and run it weekly. That's the system at the median deal volume.
When vendor scoring earns its slot — the three gates
The owned-data version doesn't beat the vendor version at every volume. There's a real workflow where predictive infrastructure pays — and the line is mechanical, not opinion.
Three gates. Pass all three and the paid tool earns its slot. Pass two — marginal. Pass one or zero — foundation model plus your sphere CSV wins.
Volume. 30+ closed deals per year, or $300K+ gross commission income. NAR median is 10 deals at $58K gross. Below 30 deals, a $6K–$12K-a-year predictive tool eats 10–15% of gross before a single deal closes. SmartZip alone runs about 14.5% of NAR-median gross.
Labor. A salaried ISA, a second licensed agent, or a transaction coordinator answering inside the 5-minute LRM window. The predictive output is worthless without a body to action it.
Geographic concentration. 8+ transactions per year in a single ZIP or sub-market. Predictive farming only ROIs when you door-knock and direct-mail the same households for 12+ months.
Same three-gate framework as the pillar. Below all three, run the Tuesday-morning prompt. Above all three, layer the vendor tool under a real ISA — see the SmartZip vs foundation model comparison for cost-to-break-even math, and /examples/predictive-analytics-real-estate for the side-by-side vendor breakdown.
What to score on (the owned-data version)
Four criteria do most of the work. They map to the prompt above and they're the same fields a vendor would charge $700 a month to score against — except you already have them in your CRM notes for free.
Last contact date. The dumbest signal that consistently beats every other ranking. Anyone you haven't talked to in 6–12 months whose notes contain a life-stage trigger ranks above anyone you talked to last week. The CRM sorts by recency of activity. The model sorts by staleness against signal — which is the actual question.
Life-stage triggers in notes. Kids graduating, divorce, retirement, job change, moving company referral, parent moving in or out. These drive 66% of your seller pipeline per NAR. They land in the notes field as half-sentences. The model reads them. The vendor's score doesn't see them.
Prior transaction with you. Closed buyer 3+ years ago, closed seller anywhere, listing represented anywhere — these contacts convert at multiples of cold data. Per NAR, 20% of a typical Realtor's income comes from past clients. Score weight should reflect that.
Referral source signal. Did this contact come from another past client? Did they refer someone since? Two-way referral activity is the strongest weak signal in the sphere — it predicts the next referral, not just the next transaction. The model sees this in notes. The dashboard doesn't.
Four inputs. One paste. One ranked list every Tuesday.
FAQ
What's the difference between AI lead scoring and predictive seller scoring?
Same product, two names. Predictive seller scoring is what the vendors (SmartZip, Offrs, Catalyze AI, Likely.ai) call it. AI lead scoring is the broader category — it includes both vendor-built scores on purchased data and foundation-model scores on owned data. See /glossary/lead-scoring and /glossary/predictive-seller-scoring for full definitions.
How accurate is AI lead scoring for real estate?
Vendor accuracy lands around 70–72% for SmartZip and Offrs. None of those figures are independently verified — every cited number traces to vendor marketing or affiliated reviewers. Practitioner aggregator data tells a different story: Offrs sits at 2.4 stars on Sitejabber across 160 reviews, with complaints about stale data and "zero closings after 12 months." For the owned-data version, accuracy is bounded by the quality of your CRM notes — the agent who actually writes things down gets dramatically better rankings than the one whose notes field is empty.
Do I need predictive analytics at 10–15 deals a year?
No. NAR-median gross commission income is $58,100. A $700-a-month predictive tool eats 14% of gross before a single deal closes. The tool only works if a salaried ISA actions the list inside a 5-minute response window — at 12 deals a year as a sole proprietor, you fail the labor gate by definition. Run the Tuesday-morning sphere prompt for $20/mo instead. It outperforms the vendor tool on owned data, costs 35× less, no contract.
Does ChatGPT actually beat SmartZip on real lead scoring?
For the median sphere-driven agent below the three gates, yes — the reason is the data source, not the model. SmartZip scores public records. ChatGPT scores the prose of your own notes. The notes contain weak signals no public-records model can see. Above all three gates (30+ deals, salaried ISA, 8+ in one ZIP), layer SmartZip under the foundation model — different tools, different jobs.
Is my CRM data safe in ChatGPT?
Paid tiers (ChatGPT Plus, Claude Pro, Gemini Advanced) don't train on your data by default. Confirm the toggle in settings is off. If your brokerage compliance team has a stricter rule, run the prompt against an anonymized export — strip names and contact info, keep the notes column, rejoin the ranking against the master CSV after.
What to do this Tuesday
Block 7:30 to 7:45 AM. Export the CSV. Open ChatGPT. Paste the prompt. First call at 8:05 driving the kids to school. Run it weekly. After four Tuesdays you'll know which triggers in your sphere actually convert — and you can tighten the prompt around those.
That's the system at the median deal volume. Cheap, owned, repeatable, defensible against any vendor model that costs ten times as much.
If you've cleared all three gates — 30+ deals, salaried ISA, 8+ in a single ZIP — and you're staffing up to operationalize the AI layer instead of just ranking your sphere, The Listing Machine is the four-week beta cohort that builds the system around the model. Prompt stack, Context Cards tuned to your voice and your market, AI-Enhanced Realtor credential at the end.
For everyone below the gates: don't buy. Save the $700 a month. Run the Tuesday-morning prompt. Reinvest the difference into actually answering the leads you already have — within 5 minutes.
Sources
Primary data:
Independent creators:
Practitioner-aggregator vendor reviews:
Last updated 2026-04-29.