AI Systems Instructor • Real Estate Technologist
Quick Answer: Pull your last 30 closings and check what AI score each lead had when they first entered your pipeline. If your CRM's top-scored leads aren't converting at 2-3x the rate of low-scored leads, the scoring model isn't working for your business. Test it with data, not faith.
Every major real estate CRM now sells "AI-powered lead scoring." Most agents turn it on and never look at it again. That's the equivalent of buying a home inspection and never reading the report. The question isn't whether your CRM has AI scoring—it's whether that scoring is actually predicting which leads will close. This guide gives you a concrete framework for evaluating AI scoring models in any CRM, measuring their accuracy against your real conversion data, and deciding whether to trust the score, supplement it, or replace it entirely.
Before you can evaluate accuracy, you need to understand what the AI is actually scoring. Every CRM uses different signals. kvCORE uses behavioral tracking—property searches, listing views, time on site—to generate its Smart CRM scores. Follow Up Boss focuses on engagement signals: call answers, email opens, text responses. Lofty (formerly Chime) combines web activity with communication frequency. Sierra Interactive weights lead source and search behavior heavily. Log into your CRM and find where the AI scoring settings live. Write down: (1) what factors the model uses, (2) whether you can customize the weights, (3) what the score range is, and (4) how often scores update. Most agents skip this step and treat the score as a black box. That's how you end up chasing leads the algorithm likes instead of leads your market rewards.
Tip: Check if your CRM lets you see why a lead received its score. kvCORE and Lofty show contributing factors. If your CRM only shows a number with no explanation, that's a red flag—you can't improve what you can't understand.
Pull your last 30 closed transactions. For each one, go back to the CRM and find the AI score that lead had when they first entered your pipeline—not their score at closing, which is inflated by all the engagement that happened during the transaction. Record the original score, lead source, days from lead entry to closing, and the sale price. This is your ground truth. If you can't find original scores (some CRMs overwrite them), start tracking now and revisit this evaluation in 90 days. Without baseline data, you're guessing. Use Claude to help analyze the data: paste your spreadsheet of closings and scores and ask it to calculate the correlation between initial AI score and actual conversion. A predictive analytics model is only as good as its predictions—and this step measures exactly that.
Tip: If you have fewer than 30 closings, use 20. Below 20, the sample size is too small to draw conclusions. In that case, pool data with a team member who uses the same CRM.
Divide your leads into three tiers based on their AI scores: top third, middle third, bottom third. Now calculate the conversion rate for each tier. The math is simple: closings in that tier divided by total leads in that tier. A working scoring model should show clear separation—top-tier leads converting at 2-3x the rate of bottom-tier leads. If all three tiers convert at roughly the same rate (say 2.1%, 1.8%, 1.9%), the scoring model isn't doing its job. It's assigning numbers without meaning. Here's what good looks like in real estate: top tier at 4-6% conversion, middle tier at 1.5-2.5%, bottom tier under 1%. If your CRM's AI scoring produces that kind of separation, it's earning its keep. If not, you're paying for a random number generator. This is the core of lead scoring evaluation—does the score actually predict outcomes?
Tip: Run this analysis by lead source too. Your CRM's scoring model might work well for Zillow leads but poorly for sphere referrals, or vice versa. Source-specific accuracy matters more than overall accuracy.
Scoring isn't just about conversion rates—it's about resource allocation. Calculate your average response time for each score tier. If you're responding to low-scored leads in 5 minutes and high-scored leads in 2 hours because the scoring didn't change your behavior, the system isn't integrated into your workflow. Then calculate cost per closed lead by tier. Take your total marketing spend for each lead source, divide by closings from that source, and cross-reference with AI scores. The goal is to answer one question: are you spending more time and money on the leads the AI says are most likely to close? According to NAR's 2024 Technology Survey, agents who respond to leads within 5 minutes are 21x more likely to qualify them. Your AI scoring should be directing that fast response to the right leads.
Tip: Set up CRM automations that use the AI score to trigger immediate response for high-scored leads. In Follow Up Boss, use Smart Lists filtered by lead score. In kvCORE, set up Behavioral Automations tied to score thresholds.
Pull 20 active leads from your CRM. Before looking at their AI scores, rank them yourself from most likely to least likely to close based on your experience. Then compare your ranking to the AI's ranking. Where you agree, the scoring model is capturing what you already know—useful for scale but not adding new insight. Where you disagree, investigate. If the AI scores a lead high that you'd rank low, look at what signals it's using. Maybe it's weighting website visits heavily, but you know that lead is just casually browsing. If you rank a lead high that the AI scores low, the model might be missing context you have—like a personal referral or a conversation that happened off-platform. This exercise reveals the model's blind spots. Every AI CRM has them. Use the 5 Essentials framework here: what Facts is the AI missing, and what Constraints is it not accounting for?
Tip: Document the disagreements. If you find a pattern—like the AI consistently overscoring portal leads and underscoring referrals—that's actionable intelligence you can use to supplement the scoring.
Here are the warning signs that your CRM's AI scoring needs intervention: Red Flag 1: Your highest-converting lead source consistently gets low scores. This means the model is trained on behavior patterns that don't match your best leads. Red Flag 2: Scores cluster in the middle with very few leads at the extremes. A scoring model that puts 80% of leads between 40-60 isn't making meaningful distinctions. Red Flag 3: Scores don't change over time even as leads engage or disengage. A static score is a stale score. Red Flag 4: New leads get high scores before any engagement data exists. The model might be over-weighting lead source at the expense of actual behavior. Red Flag 5: Your team ignores the scores entirely. If experienced agents don't trust the model, investigate why—they're often right. According to McKinsey research, AI-based lead scoring improves conversion rates by 15-20% when properly calibrated—but miscalibrated models can actively misdirect resources.
Tip: If you hit 3 or more of these red flags, don't abandon the scoring model. Instead, supplement it with your own criteria using the OODA Loop: observe where the model fails, orient your supplemental scoring around those gaps, decide on adjusted weights, and act by implementing a hybrid approach.
Based on your evaluation, you have three paths. Trust: The scoring model shows clear tier separation (2-3x conversion difference), your costs per lead align with scores, and your team uses the scores to prioritize. Keep it as-is and check quarterly. Supplement: The model works for some lead sources but not others, or it captures behavior but misses context. Build a secondary scoring layer using Claude—paste your lead data and custom criteria to generate adjusted scores that fill the model's blind spots. Export your CRM data and use AI to re-score leads with your additional context. Replace: The model shows no meaningful tier separation, the scores don't influence behavior, and your conversion rates are random relative to scores. Switch to a manual scoring system powered by AI (see our guide on building an AI lead scoring system) or evaluate a different CRM with better scoring. The key metric: if investing time in your top-scored leads doesn't produce measurably better results than random lead selection, the model has failed.
Tip: Before replacing your CRM for scoring alone, check if you can customize the scoring weights. kvCORE, Lofty, and Sierra Interactive all allow some weight adjustment. Sometimes a 10-minute settings change fixes what seems like a broken model.
I want to evaluate whether my CRM's AI lead scoring is actually working. Here's my data from the last 6 months: - CRM: kvCORE - Total leads: 340 - Closings: 12 - Lead sources: Zillow (140 leads, 3 closings), Google Ads (80 leads, 4 closings), Sphere/Referral (60 leads, 4 closings), Social Media (60 leads, 1 closing) - Score distribution: Most leads scored 30-60 on the 0-100 scale For each closing, here are the original AI scores when the lead first entered the system: [paste your 12 closings with original scores] Analyze this data and tell me: 1. Is there a meaningful correlation between AI score and conversion? 2. Which lead sources is the scoring model most/least accurate for? 3. What's my conversion rate by score tier (top/middle/bottom third)? 4. Should I trust, supplement, or replace this scoring model? 5. What specific changes would you recommend?
SCORING MODEL EVALUATION: kvCORE - 6 Month Analysis OVERALL ACCURACY: MODERATE - Supplement Recommended Conversion by Score Tier: - Top Third (60-100): 5.3% conversion (6 closings from 113 leads) - Middle Third (35-59): 2.7% conversion (4 closings from 147 leads) - Bottom Third (0-34): 2.5% conversion (2 closings from 80 leads) The top tier converts at 2x the bottom tier, which shows some predictive power, but the middle-to-bottom gap is negligible. The model is better at identifying your best leads than separating mediocre from poor ones. Source-Specific Accuracy: - Zillow: POOR. 3 closings spread evenly across all score tiers. The model doesn't differentiate serious Zillow buyers from browsers. - Google Ads: GOOD. All 4 closings came from top-tier scored leads. The behavioral signals align well with search intent. - Sphere/Referral: POOR. 2 of 4 closings were from bottom-tier scores. The model underweights personal connections. - Social Media: N/A. Too few closings to evaluate. Recommendation: Supplement the kvCORE scoring with a manual referral boost (+20 points for any lead flagged as a personal referral) and reduce reliance on scoring for Zillow leads. Consider implementing source-specific response protocols rather than universal score-based routing.
Re-evaluate your CRM's scoring model every quarter. Lead behavior patterns shift with market conditions—a scoring model calibrated for a seller's market may underperform in a buyer's market.
If your CRM is Chime/Lofty, check the AI Assistant settings separately from lead scoring. Lofty's AI assistant and lead scoring are different systems with different accuracy profiles.
Export your CRM data to a spreadsheet before analysis. Doing the evaluation inside the CRM limits your analytical flexibility. In a spreadsheet (or with Claude), you can run correlation analysis, create score distributions, and compare across time periods.
Ask your CRM vendor for documentation on their scoring methodology. kvCORE publishes theirs in the Knowledge Base. Follow Up Boss explains their scoring in the API documentation. Knowing the methodology helps you understand the biases.
Evaluating scoring accuracy using current scores instead of original entry scores
Fix: Always use the score a lead had when they first entered the pipeline. Current scores reflect engagement that happened during your sales process and inflate the apparent accuracy of the model. You need to know if the model predicted the outcome, not if it noticed you were already working the lead.
Treating all lead sources as one pool when evaluating
Fix: Break your evaluation down by lead source. A scoring model might excel at Zillow leads and fail at referrals. Source-level analysis reveals which segments need supplemental scoring and which you can trust.
Abandoning a scoring model after a short evaluation period
Fix: You need at least 20 closings to evaluate a scoring model with any confidence. For most agents, that means 3-6 months of data. If you have fewer than 20 closings, keep tracking and revisit. Premature conclusions are worse than no conclusions.
Learn the Frameworks
What AI lead scoring is, how it works, and why it matters for real estate conversion rates.
How AI transforms CRM from a contact database into a predictive sales tool.
The technology behind AI scoring models and how predictive analytics applies to lead conversion.
Detailed review of kvCORE's AI features including Smart CRM scoring and behavioral automations.
How Follow Up Boss uses AI for lead routing, scoring, and engagement tracking.
Stop guessing with AI. Join The Architect workshop to master the frameworks behind every guide on this site.