The Article You Wish Existed When You Started Using ChatGPT
You opened ChatGPT for the first time. You typed a prompt. Something came back. It was impressive. You had no idea how it worked.
Then you started hearing terms. LLM. Tokens. Context window. Temperature. Embeddings. Fine-tuning. RLHF. People dropped these words in conversations and on social media like everyone already knew what they meant.
You didn't. And nobody had time to explain.
This is that explanation. Twelve concepts. Each one gets a definition, a real estate analogy, and a "why this matters to you" section. No jargon for the sake of jargon. Every concept connects directly to how you use AI in your business.
Bookmark this. Come back to it when you encounter a term you don't remember. Consider it your Context Card for understanding AI itself.
LLM (Large Language Model)
What it is: LLM stands for Large Language Model. It's the technology behind ChatGPT, Claude, and Gemini. An LLM is a massive neural network trained on billions of text documents — books, websites, code, conversations — that learns patterns in language so well it can generate new text that sounds human-written.
Anthropic (which makes Claude) and OpenAI (which makes ChatGPT) are two of the leading LLM developers. Google builds Gemini. Each model has different strengths, but they all work on the same fundamental principle: predict the most likely next word, billions of times, until a coherent response emerges.
Real estate analogy: An LLM is like an agent who has read every MLS listing, every real estate blog, every market report, and every email ever written in the industry — and can now write new ones in any style you ask for. It hasn't experienced a single transaction, but it has absorbed the written record of millions of them.
Why this matters to you: When someone says "which LLM should I use?" they're asking which AI model is best for your needs. Claude excels at following detailed instructions and maintaining your voice. ChatGPT has the largest user base and plugin ecosystem. Gemini integrates with Google Workspace. Knowing what an LLM is helps you evaluate which one matches your workflow. Read more in our LLM glossary entry.
Tokens
What it is: A token is the unit of text that an LLM processes. It's not exactly a word — it's a chunk of text, roughly ¾ of a word on average. "Listing" is one token. "Real estate agent" is three tokens. OpenAI's tokenizer tool lets you see exactly how any text gets split into tokens.
Tokens matter because every AI model has limits. When you paste a long document into ChatGPT and it says "message too long," that's a token limit. When pricing says "$0.01 per 1K tokens," that's how API costs work. Claude Opus processes up to 200K tokens in a single conversation — roughly 150,000 words, or a 500-page book.
Real estate analogy: Tokens are like square footage in a house. Every room (word) takes up space (tokens). A studio apartment (short prompt) uses less space than a mansion (long document). And just like square footage determines what you can fit in a house, token limits determine how much information you can fit in a conversation.
Why this matters to you: Token limits determine how much context you can give the AI. If you're pasting a 50-page property inspection report into Claude, it can handle it (200K token window). If you're using ChatGPT's free tier, you'll hit the ceiling faster. Understanding tokens helps you choose the right model for the right task. See our detailed breakdown at AI Token.
Context Window
What it is: The context window is the total amount of text an AI model can "see" at once — your prompt, the conversation history, and the AI's response, all combined. Think of it as the model's working memory. Claude's context window is 200K tokens. GPT-4o handles 128K tokens. Gemini 1.5 Pro reaches 1 million tokens.
Real estate analogy: The context window is your desk. Imagine you can only have a certain number of papers on your desk at once. Everything on the desk, you can reference. Anything that falls off the desk, you forget. A bigger desk (larger context window) means you can reference more documents, more conversation history, and more data at once.
Why this matters to you: When you're midway through a long AI conversation and the responses start feeling "off" — like the AI forgot what you discussed earlier — you've probably exceeded the context window. The earliest parts of your conversation fell off the desk. The fix: start a new conversation with a fresh Context Card, or use a model with a larger window. This is explained further in our context window glossary entry.
Embeddings
What it is: Embeddings are how AI models represent the meaning of words and sentences as numbers. Each word or phrase gets converted into a long list of numbers (a vector) that captures its meaning. Words with similar meanings have similar number patterns. "Townhouse," "condo," and "apartment" would have embeddings that are numerically close together. "Townhouse" and "refrigerator" would be far apart.
Google's AI research describes embeddings as "a translation of high-dimensional data into a lower-dimensional space." In plain English: they turn meaning into math.
Real estate analogy: Embeddings are like property comps. When you search for comparable properties, you're matching on attributes: price range, square footage, bedroom count, neighborhood, condition. A comp that matches on 8 out of 10 attributes is a "closer" match than one that matches on 3. Embeddings work the same way — they match on meaning attributes, not just keywords.
Why this matters to you: Embeddings power semantic search — the reason AI can understand "I need a starter home under $300K near good schools" without you using the exact keywords the MLS uses. Every AI-powered property search tool uses embeddings. Understanding this helps you see why AI search is fundamentally different from keyword search. Explore more in our embeddings glossary entry.
Transformers
What it is: The transformer is the neural network architecture that powers every modern LLM. Introduced in Google's 2017 paper "Attention Is All You Need", transformers use a mechanism called "attention" that lets the model weigh the importance of every word in relation to every other word. Before transformers, AI models processed text sequentially — word by word, left to right. Transformers process all words simultaneously, which is why they're so much faster and more accurate.
Real estate analogy: Imagine you're reviewing a purchase agreement. The old way (pre-transformer): you'd read page 1, then page 2, then page 3, hoping you remember the financing terms on page 4 when you reach the contingency clause on page 12. The transformer way: you read the entire contract at once, and every clause is automatically cross-referenced with every other clause. The "attention" mechanism is like having a highlighter that connects related clauses across the entire document instantly.
Why this matters to you: You don't need to understand transformers to use AI. But knowing that the technology processes context holistically (not sequentially) explains why AI is so good at long documents. When you paste a 20-page inspection report into Claude and ask for a summary, the transformer architecture is why it can connect the roof issue on page 3 to the water damage on page 17. Learn more at Transformer.
Parameters and Model Size
What it is: Parameters are the internal settings that an LLM adjusts during training. Think of them as the "knowledge knobs" of the model. GPT-4 has an estimated 1.8 trillion parameters. Claude and Gemini are in similar ranges (exact numbers aren't always published). More parameters generally means the model can store more patterns and produce more nuanced output.
Real estate analogy: Parameters are like the factors an experienced appraiser considers. A new appraiser might look at 5 factors: price, size, location, condition, and age. A veteran appraiser considers 50: school ratings, flood zone, lot grade, utility access, road noise, future development plans, and dozens more. More factors (parameters) means a more nuanced, accurate appraisal. More parameters means a more nuanced, accurate AI response.
Why this matters to you: This is why model size matters and why newer models generally outperform older ones. It also explains the cost structure — larger models cost more to run (more parameters = more computation). When you see "GPT-4o-mini" vs "GPT-4o," the mini version has fewer parameters, which means it's faster and cheaper but less capable for complex tasks. Read more at Model Parameters.
Temperature and Top-P
What it is: Temperature controls how creative or predictable the AI's output is. A temperature of 0 means the AI always picks the most likely next word — very predictable, very consistent. A temperature of 1 means the AI is more willing to pick less-likely words — more creative, more varied, more prone to surprises. Top-p (also called "nucleus sampling") is a related setting that controls the range of word choices the model considers.
Anthropic's Claude documentation defaults to a temperature of 1 for most tasks. OpenAI recommends adjusting temperature or top-p, but not both at once.
Real estate analogy: Temperature is like the difference between a luxury listing description and a legal disclosure. The disclosure needs zero creativity — facts only, predictable format, temperature at 0. The luxury listing needs voice, metaphor, and emotional pull — temperature at 0.7-0.9. You wouldn't write a disclosure with poetry. You wouldn't write a luxury listing with only facts.
Why this matters to you: Most of the time, you won't adjust these settings manually (ChatGPT and Claude handle it). But understanding the concept explains why the same prompt sometimes gives different outputs. If you need exact consistency (email templates, disclosures), ask the AI to "be precise and consistent." If you need creative variation (social posts, listing descriptions), ask it to "be creative and varied." You're effectively adjusting the temperature through your instructions. See Temperature and Top-P for more.
Pre-Training and Fine-Tuning
What it is: Pre-training is the initial phase where an LLM reads billions of text documents and learns the patterns of language. This is the expensive part — it costs tens of millions of dollars and takes months on thousands of specialized chips. The result is a general-purpose model that knows a lot about everything but isn't specialized for anything.
Fine-tuning comes after. It's a second round of training on a smaller, specialized dataset. A real estate company could fine-tune an LLM on 100,000 MLS listings to make it better at writing listing descriptions. The model keeps its general knowledge but develops expertise in the specific domain.
OpenAI offers fine-tuning through their API. Google provides tuning for Gemini. Most individual agents won't fine-tune models — it's more relevant for companies building AI products.
Real estate analogy: Pre-training is like getting your real estate license. You learn general principles: contracts, law, finance, marketing, ethics. You're qualified but not specialized. Fine-tuning is like spending 5 years working a specific farm area. Now you know every street, every builder, every HOA issue, every school boundary. Same foundation, sharper expertise.
Why this matters to you: This explains why a Context Card is so powerful. You can't fine-tune ChatGPT or Claude yourself (that requires engineering resources). But you can achieve similar results by giving the model your context at the start of every conversation. The Context Card is your DIY fine-tuning — it takes a general model and makes it specialized for your business, your market, and your voice. Read more at Pre-Training.
RLHF (Reinforcement Learning from Human Feedback)
What it is: RLHF is the training technique that makes AI models helpful, harmless, and honest. After pre-training, human reviewers rank different model outputs from best to worst. The model learns from these rankings — it gets "reinforced" toward outputs that humans prefer. This is why ChatGPT and Claude give you helpful answers instead of random text.
Anthropic's research has been foundational in RLHF and its successor techniques (like RLAIF — RL from AI feedback). OpenAI's InstructGPT paper demonstrated that RLHF dramatically improved user satisfaction even with smaller models.
Real estate analogy: RLHF is like a mentorship program for a new agent. The new agent (pre-trained model) has all the book knowledge. The mentor (human reviewer) watches them handle clients and gives feedback: "That response was too aggressive. This one was perfect. Next time, lead with empathy before presenting the data." After hundreds of feedback sessions, the new agent develops judgment — not just knowledge. RLHF gives AI models judgment, not just knowledge.
Why this matters to you: RLHF is why Claude and ChatGPT "understand" that a listing description should be appealing, not just accurate. It's why the AI won't generate discriminatory content about neighborhoods. It's why the output feels helpful rather than robotic. Understanding RLHF also explains why different models have different "personalities" — they were trained on different human preferences. See RLHF.
Inference
What it is: Inference is the process of using a trained AI model to generate output. When you type a prompt and hit enter, you're running inference. The model takes your input, processes it through its billions of parameters, and produces a response. Training happens once (and costs millions). Inference happens every time you use the model (and costs fractions of a cent per request).
Real estate analogy: Training is like building the house. Inference is like living in it. The construction (training) takes months, costs a fortune, and happens once. Using the house (inference) happens every day, costs relatively little (utilities), and is where all the value gets delivered. You don't rebuild the house every time you turn on the lights.
Why this matters to you: Inference speed determines how fast the AI responds to your prompts. When ChatGPT or Claude feels "slow," it's an inference bottleneck — the model is processing your request through its parameters. Smaller models run inference faster (which is why GPT-4o-mini responds quicker than GPT-4o). Understanding this helps you pick the right model: use the fast model for simple tasks, the powerful model for complex ones. See Inference.
Putting It All Together: How These Concepts Connect
Here's the full picture in one paragraph. An LLM is a transformer-based model with billions of parameters, pre-trained on internet-scale text data, then refined through RLHF to be helpful. When you type a prompt, the model runs inference — processing your input as tokens within its context window, using embeddings to understand meaning, and generating output with creativity controlled by temperature.
You don't need to understand the math. But understanding the concepts makes you a better user. You'll know why long conversations lose context (context window limits). You'll know why the same prompt gives different outputs (temperature). You'll know why Claude follows your voice better with a Context Card (it's like DIY fine-tuning). You'll know why some models are faster and some are smarter (parameter count vs. inference speed).
That knowledge separates agents who use AI from agents who use AI well. And that gap is where the competitive advantage lives.