Blog

  • How Does AI Get Its Basic Education Before It Meets You?

    How Does AI Get Its Basic Education Before It Meets You?

    Pre-training is the process of teaching an AI model general knowledge from a massive dataset before teaching it any specific job. It is the heavy lifting that turns a blank computer brain into a knowledgeable generalist.

    Hey Common Folks!

    Yesterday we covered tokens — how AI breaks your prompt into the small pieces it can actually read. Now we answer the question that opens up right underneath that one: where did AI learn what those tokens actually mean in the first place?

    The answer is Pre-training. It is the “P” in GPT (Generative Pre-trained Transformer). It is the phase where a Foundation Model learns 99% of everything it knows, and it is the reason today’s AI is so shockingly capable, and occasionally so shockingly confident about things it shouldn’t be.


    What is Pre-training?

    Think of it as General Education. Before you become a doctor, an accountant, or a coder, you first have to learn the alphabet, how to read, how to do basic math, and how the world works. You don’t start kindergarten by performing brain surgery.

    Pre-training is the AI version of that long, broad foundation phase. The model spends an enormous amount of time learning general knowledge before anyone ever asks it to do a specific job.


    The Problem: Learning from Scratch is Hard

    Why do we even need pre-training? Why can’t we just teach an AI to answer customer support emails directly?

    Because deep learning models are data hungry. If you want to train a model from scratch to recognize a cat, you need thousands of photos of cats. But not just photos — you need humans to manually label them: “This is a cat,” “This is a dog.” That labeling process is slow, expensive, and tedious.

    And if you tried to teach a computer to understand English by only showing it customer support emails, it would fail. It wouldn’t know what a verb is, what “angry” sounds like, or even how to structure a sentence.

    You can’t shortcut the foundation. You have to build it.


    The Solution: Predict What Comes Next

    Pre-training flips the problem on its head. Instead of teaching the AI a specific job immediately, we let it loose on a massive amount of data — the internet, books, papers, code — with one beautifully simple goal: predict what comes next.

    The AI reads billions of sentences and tries to guess the next word over and over. After “Hello,” the next word is often “World” or “There.” After “Once upon a,” it is almost always “time.” Through trillions of these tiny prediction games, the model gradually picks up grammar, language patterns, factual knowledge, and slang — the broad shape of how the world is described in text.

    The beautiful part: nobody has to label anything. The next word in the sentence is itself the answer.

    It learns the general features of the world first.

    • In images: it learns what edges, circles, and shapes look like before it learns what a “face” is.

    • In text: it learns how language works before it learns how to write a poem about your dog.

    The whole philosophy comes from a simple idea in transfer learning: don’t reinvent the wheel. If a model already exists that understands the basics, build on top of that knowledge instead of starting from zero.


    The Analogy: The Medical Student

    To understand pre-training versus what comes after it, think of a medical student.

    1. Pre-training (Medical School): The student spends years reading textbooks and learning anatomy, biology, and chemistry. They aren’t treating patients yet. They are just building a massive foundation of general knowledge. They know a little bit about everything.

    2. Specialty Training (Residency): Now that student goes to a hospital to specialize, maybe in cardiology, maybe in surgery. They take all that general knowledge and focus it on one specific task.

    A pre-trained model like GPT-5, Claude, or Gemini is the medical school graduate. It has read the library. It is smart and broad, but it hasn’t specialized in your company’s data, your customer’s tone, or your industry’s jargon yet. That comes later.


    Why is This a Game Changer?

    Before pre-training became the standard, if you wanted to build an AI to translate languages, you needed a massive labeled dataset of English-to-French sentences. If you didn’t have that data, you were stuck. Every new task meant starting from zero.

    With pre-training:

    1. You need less data later. Because the model already knows English, you only need to show it a small number of examples of what you specifically want it to do for it to catch on.

    2. You skip the hardest part. A pre-trained foundation already exists. You just point it at the specific job you care about.

    A note on scale, because this matters for understanding why pre-training is such a big deal in 2026: back when this technique first took off around 2018–2020, pre-training a small language model was a days-or-weeks project on a modest cluster. Today’s frontier models — GPT-5, Claude, Gemini — take months to pre-train, run on tens of thousands of GPUs, and cost hundreds of millions of dollars in compute alone. Pre-training has gotten bigger, not smaller, as AI has scaled.


    The Takeaway

    Pre-training is the heavy lifting. It is the process of creating a Foundation Model — a knowledgeable generalist that can later be pointed at almost any task.

    • It is the bridge between a blank computer brain and one that has read the library.

    • It is the difference between teaching a baby to write a novel (impossible) and asking a college graduate to write a novel (possible).

    • It is the layer underneath every modern AI you use — ChatGPT, Claude, Gemini, Copilot — built long before you ever typed a single prompt.

    One honest caveat for 2026: pre-training is the foundation, but it is not the whole story anymore. Modern AI also goes through additional specialty training on top of pre-training, and that is where today’s models learn to be helpful, careful, and reasoning-capable. We’ll dig into that in the upcoming articles.

    When you use ChatGPT or Claude, you are talking to a model that has already finished its general education. It has read the library. Now it is ready to work for you.


    Coming Up

    You now know what pre-training is — the general-education phase where AI learns how the world is described in text. But how does the AI actually do the learning? What does the day-to-day classroom look like? Next, we’ll walk through the study, practice, and test loop — the actual mechanics of how a blank-slate model goes from random guesses to meaningful answers.


    AI for Common Folks – Making AI understandable, one concept at a time.

    Subscribe now

    Leave a comment

  • What Are Tokens and Why Does AI Count Words Differently?

    What Are Tokens and Why Does AI Count Words Differently?

    A Token is the smallest unit of text an AI processes — a whole word, part of a word, or even a single character. It is the bridge between human language and the numbers AI actually understands.

    Hey Common Folks!

    Over the last two articles, we covered what a prompt is and how to write a good one. That was about how to talk to AI. Now we look at the other side of that conversation: how AI actually reads what you typed.

    If you have ever looked at the pricing page for ChatGPT or Claude, or seen an error message saying “Token limit exceeded,” you have probably scratched your head. Why don’t they just count words? Why this fancy term “Token”?

    It turns out, computers don’t read the way we do. To an AI, a Token is the fundamental unit of reality.


    What is a Token?

    A token is a chunk of text. It can be a whole word (”apple”), part of a word (”smart” + “phones”), a piece of punctuation, or even a single character.

    Think of tokens as the atoms of language. Just as a molecule of water is made up of atoms (Hydrogen and Oxygen), a sentence is made up of tokens.


    The Analogy: The Lego Castle

    Imagine you have a beautiful Lego castle (a sentence).

    • For Humans: We look at the castle and say, “That’s a castle.” We read the whole word.

    • For AI: The AI looks at the individual plastic bricks used to build it.

    Sometimes, a single brick is a whole window (a whole word like “apple”). Other times, to build a long wall (a complex word like “smartphones”), you need two bricks: “smart” and “phones.”

    The AI doesn’t see the castle; it sees a pile of bricks. It processes those bricks one by one to understand the structure.


    Why Not Just Use Words?

    A fair question: “Why break ‘smartphones’ into two tokens? Why not just treat it as one word?”

    Because computers don’t understand English. They understand numbers.

    To teach an AI language, every piece of text first has to be converted into a list of numbers (called token IDs), which the AI later turns into rich mathematical representations called embeddings.

    • If we assigned a unique number to every single word in the English language, the list would be infinite and unmanageable. Every name, every typo, every made-up word would need its own slot.

    • By breaking complex words into smaller chunks (tokens), the AI can understand words it has never seen before by recognizing their parts.

    If the AI knows “smart” and it knows “phones,” it can understand “smartphones” without needing a separate definition for it.


    How Does It Work? (The Tokenization Process)

    Before your prompt hits the AI, a process called Tokenization happens.

    1. Input: You type “I love AI.”

    2. Chopping: The tokenizer chops this up. In a modern GPT-style tokenizer, it looks roughly like: ["I", " love", " AI", "."] — four tokens, including the leading spaces. (Real tokenizers are picky like that, and they bake the spacing into the tokens themselves.)

    3. Numbering: The system assigns a specific ID number to each chunk. Modern tokenizers have a vocabulary of tens of thousands to a few hundred thousand possible tokens, so real IDs are usually large numbers. We’ll use small ones below for illustration: [40, 1842, 16124, 13].

    4. Processing: The AI receives that list of numbers and asks itself the one question it knows how to answer — what number should come next? It picks the most likely next number based on the patterns it learned during training (we walked through that pattern-prediction in How AI Actually Learns). That predicted number is converted back into a piece of text, and then the AI does it again — one token at a time — until your full answer is built.

    Want to see this live? Free tools like OpenAI’s tokenizer page or Anthropic’s token-counting API let you paste any text and watch how it gets split. It’s worth doing once — you will never look at a long prompt the same way again.


    The “Currency” of AI

    Why should you care about tokens? Because in the world of Generative AI,
    Tokens = Money.

    When companies like OpenAI or Anthropic charge you, they don’t charge per question. They charge per token.

    • Input Tokens: what you type, paste, or upload into the chat.

    • Output Tokens: what the AI writes back.

    A useful 2026 reality check: output tokens almost always cost more than input tokens — typically 3 to 5 times more — because generating a careful answer is harder for the model than reading one. So a long, rambling AI response costs you more than a short, precise one. Brevity in your prompt and in the format you ask for is a real lever on cost.

    Roughly speaking, 1,000 tokens is about 750 words. So if you ask the AI to summarize a 50-page document, you are “spending” tokens to feed that document into the model, and spending more tokens to get the summary back out.

    The good news for 2026: frontier models like Claude and Gemini now support context windows in the hundreds of thousands to over a million tokens. A 500-page novel can fit in a single prompt today — something that was impossible just two years ago. Tokens still cost money, but the ceiling on how much you can hand the AI in one shot has gone way up.


    The Takeaway

    A Token is simply a chunk of text.

    • It is the bridge between human language and machine numbers.

    • It is the unit used to measure the size of the AI’s memory (the context window).

    • It is the unit used to calculate the cost of using the AI.

    Understanding tokens explains why your long prompt sometimes gets cut off, why running a massive analysis costs a few dollars instead of a few cents, and why brevity in your prompts is a quietly powerful skill.


    Coming Up

    You now know how AI breaks your prompt into tokens. But where did it learn what those tokens actually mean in the first place? Next, we’ll dig into how AI gets its basic education — the massive pre-training phase where a blank-slate model reads a huge slice of the internet and starts to make sense of the world.


    AI for Common Folks – Making AI understandable, one concept at a time.

    Subscribe now

    Leave a comment

  • How Do You Get Better Answers from AI?

    How Do You Get Better Answers from AI?

    Prompt Engineering is the skill of crafting inputs to guide AI models toward the specific output you want — less about coding, more about clear communication with machines.

    Hey Common Folks!

    In our last article, we covered what a prompt is — the bridge between human intent and machine output. Before that, we met the Large Language Model — the engine inside ChatGPT, Claude, and Gemini — and a few articles back we built up to it through Neural Networks.

    You now know what a prompt is. This article is about the part that actually changes your results: how to write a good one.

    Have you ever asked ChatGPT a question, gotten a generic, unhelpful answer, then rephrased it and suddenly gotten something brilliant? That’s not luck. That’s the difference between a bad prompt and a good one.

    The skill of crafting better prompts has a name: Prompt Engineering.


    Why “Engineering”?

    Don’t let the word scare you. This isn’t about writing code or building bridges.

    Prompt Engineering is about communication skills — talking to a machine in a way it understands best. It’s the art of being specific, structured, and strategic with your instructions.


    The Analogy: Back to Our New Hire

    Remember the new hire we’ve been working with? The one who read the entire internet before day one?

    They have all the knowledge in the world. But they are extremely literal and have zero context about your specific life or business.

    Bad Manager (You):
    “Write an email to the client.”

    The new hire (The AI):
    Panic. Which client? Good news or bad? Formal or casual? They guess and write something generic and robotic.

    Prompt Engineer (You):
    “Act as a senior sales manager. Write a polite but firm email to ‘Client X’ regarding their overdue payment of $500. Keep it under 100 words. Don’t use emojis.”

    The new hire (The AI):
    Understood. They write exactly what you need because you gave them the Role, the Context, and the Constraints.

    Prompt Engineering is just being a really good manager to your AI new hire.


    Why Does This Work? (The Technical Bit)

    Remember how LLMs work? They predict the next word based on patterns they’ve seen.

    When you write a detailed prompt, you’re filling the AI’s Context Window with specific patterns. The AI looks at those patterns and generates text that fits that specific context.

    When you go a step further and include examples of the input-output pattern you want, you’re using something researchers call In-Context Learning — the AI picks up the pattern from the examples in your prompt without any retraining. We’ll see this in action in technique #3.

    Vague stage = vague performance.
    Specific stage = specific performance.


    The Five Techniques That Actually Work

    You don’t need a degree for this. Master these five approaches.

    1. Role Prompting (The “Act As” Hack)

    Tell the AI who it’s supposed to be.

    • Instead of: “Explain quantum physics.”

    • Try: “You are a kindergarten teacher. Explain quantum physics to a 5-year-old using only examples they’d understand.”

    This sets the tone, complexity, and approach immediately.

    2. Be Specific About Output

    Don’t leave format to chance.

    • Instead of: “Give me marketing ideas.”

    • Try: “Give me 5 marketing ideas for a local bakery. Format as a numbered list. Each idea should be under 20 words.”

    The more specific your constraints, the more useful the output.

    3. Few-Shot Prompting (Show, Don’t Just Tell)

    Sometimes instructions aren’t enough. Show examples.

    If you want the AI to convert slang to formal English, demonstrate:

    • Input: “Sup bro?” → Output: “Hello, how are you?”

    • Input: “Gotta run.” → Output: “I must leave now.”

    • Input: “No way!” → Output:

    The AI sees the pattern and continues it perfectly. That’s In-Context Learning at work.

    4. Chain of Thought (Let’s Think Step-by-Step)

    For complex problems, add: “Let’s think step-by-step.”

    This forces the AI to show its reasoning before giving the final answer. Accuracy on math and logic problems goes up because the AI can’t just guess — it has to work through the problem.

    A note for 2026: this was the original trick that started the whole “reasoning model” era. Today’s reasoning-tier models from OpenAI, Anthropic, and Google already do this internally before they answer you. So the phrase matters less for the latest models, but it still helps with smaller or older ones, and it is the foundation everything else is built on.

    5. Give Context and Background

    The AI doesn’t know your situation. Tell it.

    • Instead of: “Write a resignation letter.”

    • Try: “I’ve worked at this company for 3 years. My boss has been supportive. I’m leaving for a better opportunity, not because I’m unhappy. Write a professional resignation letter that maintains the relationship.”

    Context changes everything.


    Want to Practice This?

    A friend of mine, Don Barger, built a free tool at ripen.donbarger.com around a clean little framework he calls RIPEN:

    • Role — who or what should the AI act as?

    • Input — what information are you giving it?

    • Process — what steps should it take to get to the answer?

    • Example — show it what good output looks like.

    • Notes — tone, constraints, guidelines, anything else.

    It’s the same territory we just walked through, repackaged as a five-letter mnemonic that’s easy to recall the moment you’re actually typing a prompt. If you want a structured place to drill these techniques into muscle memory — whether you’re writing a one-off prompt or building a chatbot’s personality — ripen.donbarger.com is a clean starting point.


    Common Mistakes to Avoid

    Being too vague: “Help me with my project” tells the AI nothing.

    Mashing unrelated tasks into one prompt: Back when ChatGPT first launched in 2022, even simple multi-task prompts like “write me an email, also summarize this, and list action items” would come out confused or messy. Today’s models handle that combo easily — so this advice has evolved, not disappeared. The underlying principle still holds in two specific cases: when the tasks have conflicting tones or audiences (a casual Slack message and a formal client email about the same news), and when you want to iterate on one piece of the answer without regenerating the rest. For serious work, separating prompts still gives you cleaner output and tighter control.

    Not iterating: Your first prompt rarely gives perfect results. Treat it as a conversation — refine and improve.

    Forgetting constraints: Without limits, AI tends toward verbose, generic responses. Add word counts, formats, and restrictions.


    Simple Examples to start with:

    Writing

    • Before: “Write a blog post about productivity.”

    • After: “Write a 500-word blog post about productivity for remote workers. Use a conversational tone. Include 3 actionable tips. Start with a relatable scenario.”

    Research

    • Before: “Tell me about climate change.”

    • After: “Summarize the top 3 causes of climate change in simple terms a high school student would understand. Use bullet points. Keep it under 200 words.”

    Code

    • Before: “Write Python code.”

    • After: “Write a Python function that takes a list of numbers and returns the average. Include comments explaining each step. Handle the case of an empty list.”


    The Limitations (Keeping It Real)

    Prompt Engineering has limits.

    It can’t fix bad models: If the underlying AI is weak, no prompt will save it.

    It’s not magic: Some tasks are genuinely hard for AI. Better prompts help, but they don’t make AI capable of everything.

    It takes practice: You’ll write bad prompts before you write good ones. That’s normal.


    The Takeaway

    Prompt Engineering isn’t a “technical” skill — it’s a clarity skill.

    • Vague prompts = average results.

    • Specific, structured prompts with examples = excellent results.

    Writing code is still a real, valuable craft. The engineers who can also articulate clearly, in plain language, with the right context, are the ones getting the most out of AI right now. Clarity is no longer a soft skill. It is a multiplier on top of every other skill you already have.


    Coming Up

    You now know how to ask. But before the AI can even read your prompt, it has to break it into pieces. Next, we’ll explore Tokens — why AI counts your words differently than you do, why a four-letter word can sometimes count as two tokens, and why that quietly affects what AI costs you and what it remembers.


    AI for Common Folks – Making AI understandable, one concept at a time.

    Subscribe now

    Leave a comment

  • What Is a Prompt and Why Does It Matter So Much?

    What Is a Prompt and Why Does It Matter So Much?

    A Prompt is the input you provide to an AI model — text, image, voice, or document to get a specific response. It’s the bridge between human intent and machine output.

    Hey Common Folks!

    In our last article, we met the Large Language Model — the engine inside ChatGPT, Claude, and Gemini. Before that, we decoded GPT, and before that we introduced Foundation Models as the general-purpose AI brains powering today’s tools.

    We know these AI models are incredibly powerful. But a Ferrari is useless if you don’t know how to drive it.

    How do you actually talk to this super-brain? You don’t use Python code or binary zeros and ones. You use a Prompt.

    And here’s the part most people miss: the same AI can give you a brilliant answer or a useless one based on nothing but how you asked. Same model. Same knowledge. Completely different output. That gap is what this article is about.


    What is a Prompt?

    In simple terms, a prompt is whatever you give the AI to get it to do something.

    • When you type a question into ChatGPT, that text is the prompt.

    • When you upload a screenshot of a spreadsheet and ask for key insights, that combination is the prompt.

    • When you paste a 50-page PDF and ask for a summary, the document plus the instruction is the prompt.

    • When you speak to AI through your phone, your voice is the prompt.

    A prompt is the bridge between human intent and machine output. Everything you want, you have to express through it. The AI cannot read your mind. It can only work with what you give it.


    The Analogy: Back to Our New Hire

    Remember the new hire we met last article? The one who read the entire internet before day one?

    They have all the knowledge in the world. But they don’t know you yet. They don’t know your style, your boss, your deadlines, your preferences. On day one, they need instructions for every single task, and the quality of your instructions determines the quality of their work.

    • Bad Prompt: “Write an email.”

      • The new hire thinks: To whom? About what? Angry tone? Professional? Long? Short?

      • The result: A generic, useless draft.

    • Good Prompt: “Write a polite email to my boss asking for two days of sick leave next week. Keep it under 50 words and don’t sound demanding.”

      • The new hire thinks: Got it. Topic, tone, recipient, length — all clear.

      • The result: A perfect, ready-to-send email.

    The AI is that new hire. It has the capability to do almost anything, but it relies heavily on your instructions to know what to do, how to do it, and who it’s doing it for.


    Why One Word Can Change Everything

    You might have heard the term Prompt Engineering. It sounds fancy, but it just means “the art of asking correctly.”

    A fair question to raise: in 2026, with ChatGPT, Claude, and Gemini so much more capable than they were a few years ago, does this skill still matter? The honest answer is yes, and the bar has moved. Early AI models needed near-magical phrasing to produce anything useful at all. Today’s models are far more forgiving — they’ll give you something even from a sloppy prompt. But the gap between something useful and exactly what you need still comes down to how you asked.

    LLMs are sensitive to wording. Changing a single word in your prompt can completely change the answer you get back. Three quick examples:

    One word changes the audience.

    • “Explain gravity.” → a physics textbook definition.

    • “Explain gravity to a 5-year-old.” → a story about falling apples.

    One word changes the tone.

    • “Rewrite this email.” → the AI picks a tone for you, maybe the wrong one.

    • “Rewrite this email more politely.” → it keeps your meaning but softens the edges.

    One word changes the format.

    • “Summarize this article.” → a paragraph.

    • “Summarize this article in bullet points.” → a scannable list.

    Same AI. Same underlying knowledge. Completely different outputs. From one word of difference.

    Why does this happen? Remember from the last article that the LLM predicts one word at a time, always choosing the most likely continuation of what came before. Your prompt is the “what came before.” When you change a word, you change the entire downstream probability of what word should come next. The AI isn’t being tricky. It’s responding exactly as designed to the pattern you gave it.

    Same engine, same knowledge, completely different output. All from how you asked.


    The Anatomy of a Prompt

    Every strong prompt has three basic parts:

    1. The Persona: Tell the AI who it is.

      • Example: “You are an expert travel guide” or “You are a Python coding tutor.”

    2. The Task: Tell it what to do.

      • Example: “Plan a 3-day trip to Goa.”

    3. The Constraints and Format: Tell it how you want the answer.

      • Example: “Give me the answer as a bulleted list” or “Keep it under 100 words.”

    Stack all three together and the new hire knows exactly who they are, what they’re doing, and how the answer should look.

    Most prompts people type skip one or two of these parts. That’s why most AI answers feel generic.


    Bad Prompts vs Good Prompts: Three Real Examples

    Here’s what that looks like in practice across three everyday tasks.

    Writing

    • Bad: “Write a blog post about productivity.”

    • Good: “Write a 500-word blog post about productivity tips for remote workers. Use a conversational tone. Include three actionable tips. Start with a relatable scenario.”

    Why the second works: The AI now knows the length, audience, tone, structure, and opening style. Five decisions you didn’t have to make yourself.

    Research

    • Bad: “Tell me about climate change.”

    • Good: “Summarize the top three causes of climate change in simple terms a high school student would understand. Use bullet points. Keep it under 200 words.”

    Why the second works: You’ve turned an infinite-scope question into a focused, scannable answer at a specific reading level.

    Code

    • Bad: “Write Python code.”

    • Good: “Write a Python function that takes a list of numbers and returns the average. Include comments explaining each step. Handle the case of an empty list.”

    Why the second works: The AI now has a specification — inputs, outputs, edge cases, and documentation. It produces code you can actually use.

    In all three cases, the AI didn’t get smarter. The prompt did.


    The Four Most Common Prompting Mistakes

    Before we close, here are the four patterns that produce most of the frustrating AI experiences. Name them once, and you’ll start catching yourself doing them.

    1. Being too vague. “Help me with my project” tells the AI nothing. No topic, no format, no outcome.

    2. Not giving the AI the context it needs. “Fix this bug” without sharing the code, the error message, and what the code is supposed to do leaves the AI guessing. Being clear about what you want is not the same as giving the AI the raw material to do the job. Modern models are great, but they still can’t read your screen or your mind.

    3. Giving no constraints. Without a word limit, audience, or format, the AI defaults to verbose and generic.

    4. Expecting perfection on the first try for complex tasks. For simple asks, modern models often nail it immediately. But for anything nuanced — a layered analysis, a specific tone, a tricky coding problem — iteration is part of the skill, not a sign you’re doing it wrong.

    The good news: each of these has a clean fix, and there are specific techniques that turn vague prompts into surgically precise ones. We’ll walk through all of them in the next article.


    The Takeaway

    A prompt is how we program modern computers using natural language instead of code.

    • It is the steering wheel of the AI.

    • It is the instructions you hand the new hire.

    • It is the difference between a useless generic reply and a precise, personalized answer.

    The better you get at prompting, the smarter the AI seems to become. Not because the model changes, but because your instructions unlock more of what was always there.

    Coming Up

    Now you know what a prompt is and why it matters. But knowing what a steering wheel is doesn’t make you a driver. How do you actually get good at this? What separates the people who get brilliant AI answers from the ones who give up after a vague first try? Next, we’ll unpack the five techniques that turn any prompt into a great one — from role-playing to step-by-step thinking. That’s how you graduate from knowing about AI to actually getting what you want from it.


    AI for Common Folks – Making AI understandable, one concept at a time.

    Subscribe now

    Leave a comment

  • What Is the Engine Behind ChatGPT, Claude, and Gemini?

    What Is the Engine Behind ChatGPT, Claude, and Gemini?

    A Large Language Model (LLM) is an AI system trained on massive amounts of text to understand and generate human language. Think of it as the world’s most over-prepared new hire, one who read every document on the internet before their first day.

    Hey Common Folks!

    In our last two articles, we covered Foundation Models, the massive general-purpose AI brains, and GPT, the most famous family in that category. We talked about the Swiss Army Knives of AI and the three-letter recipe (Generative, Pre-trained, Transformer) that cracked modern language AI.

    But GPT is just one example of a broader category. Claude, Gemini, Llama, DeepSeek — these are all in the same family. That family is called Large Language Models, or LLMs.

    And LLMs are the specific technology powering every AI chatbot you’ve ever used.

    The best way to understand one is to think of it as a new employee at your company. A very unusual one.


    Meet the New Hire

    Imagine your company just hired someone. Before their first day, they did something no human could do: they read every email, every Slack message, every report, every meeting note, every document your company has ever produced. Not just your company, actually. Every company. Every book. Every website. Every Wikipedia article. Every Reddit thread. Every piece of code on GitHub.

    They didn’t understand all of it the way you would. They didn’t form opinions or have experiences. But they noticed patterns. They noticed that after “Dear” people usually write a name. That after “quarterly revenue increased” people usually write “by” followed by a percentage. That when someone asks “how do I” the next words are usually a task, followed by step-by-step instructions.

    This new hire didn’t memorize facts like a textbook. They memorized how language flows. They can finish anyone’s sentence, in any department, on any topic, because they’ve seen millions of similar sentences before.

    That’s an LLM. That’s the whole idea.


    The World’s Best Sentence Finisher

    At its core, an LLM does one thing: predict the next word. (Technically, it predicts the next token — a small chunk of text that’s usually a word or part of a word. We’ll cover tokens in a future article. For now, “word” is close enough.)

    You actually do this too. If I say: “The capital of India is New…”

    Your brain instantly completes: “Delhi.”

    You didn’t look it up. You’ve seen those words together enough times that the completion is automatic.

    Your phone does this too. You type “I am on my…” and your keyboard suggests “way.” Your phone learned this pattern from your text messages.

    Now scale that up dramatically.

    Your phone looks at the last 3 words to guess the next one. An LLM looks at the last 300,000 words. Your phone learned from your texts. An LLM learned from the entire internet.

    Back to our new hire analogy: imagine asking them to finish this sentence: “Based on our Q3 projections and the current market conditions, the board recommends that we…”

    Because they’ve read millions of similar corporate emails, they know what typically comes next. Not because they understand finance. Because they’ve seen this pattern thousands of times. They’re pattern-matching at a scale no human could match.

    That’s how ChatGPT writes entire paragraphs. One word at a time, each chosen because it’s the most likely continuation of everything before it. Like a new hire who’s so well-read that they can finish any sentence in any department.


    How the New Hire Follows Conversations

    Here’s where it gets interesting. Early AI systems were terrible at long sentences. Tell them a long story and by the end, they’d forgotten the beginning. Like a new hire who nods along in a meeting but can’t connect what was said in minute one to what’s being discussed in minute thirty.

    Then in 2017, researchers at Google published a breakthrough called the Transformer. (The “T” in GPT stands for Transformer. That’s how fundamental this is.)

    Transformers gave LLMs a superpower called self-attention. Here’s what that means.

    Consider this sentence: “The animal didn’t cross the street because it was too tired.”

    What does “it” refer to? The animal or the street?

    You know “it” means the animal because the animal is “tired.” Streets don’t get tired.

    Before transformers, AI would struggle with this. It read words one by one, left to right, and by the time it got to “it,” the word “animal” was already fading from memory.

    Self-attention changed that. Now the LLM looks at all the words in the sentence at once and draws connections between them. When it hits the word “it,” it checks: what does “it” connect to? It sees “tired” and traces back to “animal,” not “street.” It understands the relationship.

    Back to our new hire: imagine they’re reading a 50-page email thread where someone says “she approved the budget.” Self-attention is how the new hire traces “she” back to the CFO mentioned 30 emails ago, not the intern mentioned 2 emails ago. They can follow references across long, messy conversations.

    This is what lets LLMs understand context, answer follow-up questions, get jokes, and write code that actually makes sense across hundreds of lines. Before transformers, AI was like a new hire reading one word at a time and forgetting the beginning of the email by the end. After transformers, they can hold the entire conversation in their head at once.


    Training the New Hire: Three Stages

    Building an LLM like ChatGPT or Claude isn’t one step. It’s an onboarding process with three stages. Just like any new hire goes through orientation before they’re ready to talk to customers.

    Stage 1: The Reading Phase (Pre-Training)

    This is where the new hire reads everything. Terabytes of text. Books, websites, Wikipedia, code, academic papers.

    During this phase, the LLM plays a game: we hide a word in a sentence and ask it to guess. If it guesses wrong, it adjusts its internal settings. (Remember the chai analogy? Same loop. Predict, check the error, adjust, repeat. Millions of times.)

    Those “internal settings” are called parameters. Think of them as tiny dials. Modern frontier models (GPT-5, Claude 4, Gemini 2) are believed to have trillions of them, though the exact numbers are kept secret. Each dial is like one of the chai recipe settings from our previous article: a small adjustment that slightly changes the output. Together, trillions of tiny dials produce language that sounds remarkably human.

    That’s what “Large” means in Large Language Model. Large = trillions of adjustable dials, trained on a massive amount of text.

    After pre-training, the new hire knows grammar, facts, writing patterns, coding conventions, and the general structure of human communication. But they’re not helpful yet. Ask them a question and they’ll just keep writing, trying to complete the sentence rather than answer you. They’re like a new hire who’s read the entire company wiki but doesn’t know how to have a normal conversation.

    Stage 2: Job Training (Fine-Tuning)

    Now we teach the new hire how to actually do their job.

    We show them thousands of examples of good conversations:

    • Customer asks: “How do I reset my password?”

    • Good response: “Here are the steps: go to Settings, click Security…”

    • Bad response: “…and also how to reset your username and your profile picture and your billing information and…”

    The new hire learns the format: when someone asks a question, give a direct, helpful answer. Don’t ramble. Don’t go off on tangents.

    This is fine-tuning. Same new hire, same knowledge from the reading phase, but now they know how to channel it into a helpful conversation instead of an endless monologue.

    Stage 3: Performance Reviews (Human Feedback)

    The new hire is now having real conversations. But sometimes they’re rude. Sometimes they make things up. Sometimes they give dangerous advice.

    So we bring in human reviewers. They chat with the LLM and rate the responses. Helpful and accurate? Thumbs up. Rude, wrong, or harmful? Thumbs down.

    The model learns: “Humans prefer it when I’m clear, honest, and careful. They don’t like it when I make things up or lecture them.”

    Think of it as ongoing performance reviews. The new hire adjusts their behavior based on what gets positive feedback and what gets complaints.

    This last stage is why ChatGPT and Claude feel different from each other even though they’re both LLMs. Different companies hire different reviewers with different values. Same new hire, different management styles, different workplace cultures.


    When the New Hire Makes Things Up

    Here’s the catch with our new hire. They’re so well-read and so good at sounding confident that sometimes they make things up. And they deliver the fiction with the exact same confidence as the facts.

    This is called hallucination.

    Remember: the LLM predicts the next most likely word. It doesn’t have a database of facts. It doesn’t “look things up.” It generates text that sounds right based on patterns.

    Imagine asking the new hire: “Who designed the Golden Gate Bridge?”

    They’ve read enough about bridges and famous people that they might say: “The Golden Gate Bridge was designed by Thomas Edison in 1932.” That sentence is completely wrong. But it sounds like a fact. It has the right structure, the right confidence, the right rhythm of a true statement.

    The new hire isn’t lying on purpose. They’re doing what they always do: predicting what the most likely next words would be. And sometimes the most likely-sounding answer isn’t the true answer.

    This is the single most important thing to understand about LLMs: they are designed to sound right. Not to be right.

    Often they are right, because patterns in language usually reflect reality. But not always. And they’ll never pause and say “Actually, I’m not sure about this.” They’ll just keep predicting the next most confident-sounding word.


    Where You Already Use LLMs

    You interact with this new hire more than you realize. They’ve been placed in departments all across your digital life:

    • ChatGPT, Claude, Gemini: The obvious ones. Every conversation is an LLM predicting one word at a time.

    • Email: Gmail’s “Help me write” and Outlook’s Copilot. The new hire is drafting your emails.

    • Code: GitHub Copilot suggests code as developers type. The new hire sits next to every programmer.

    • Search: Google and Bing now use LLMs to summarize search results instead of just showing links. The new hire reads all the results and writes you a summary.

    • Customer service: Many companies have replaced scripted chatbots with LLM-powered support. The new hire handles your complaints now.


    The New Hire’s Limitations (Keeping It Real)

    Our new hire is impressive. But they have real weaknesses you should know about:

    They stopped reading on a specific date. Every LLM has a knowledge cutoff. Ask about yesterday’s news and they genuinely don’t know. It’s like the new hire read everything up to their start date but hasn’t checked the news since. (Some systems work around this by connecting to the internet, but the core model itself is frozen in time.)

    They don’t truly understand. They’re the world’s best pattern matcher, not a thinker. They can sound confident while being completely wrong. They don’t “know” anything the way you know your own name. They know what words usually follow other words. That’s it.

    They’re expensive to keep around. Every response costs computing power. That’s why advanced AI access isn’t free. Running a trillion dials for every single word in every single response adds up fast.

    Their memory has limits. They can only hold so much of the conversation at once. This is called the context window. It’s like the new hire can remember the last hour of conversation clearly but starts forgetting what was said this morning. Long conversations can feel like the AI forgot what you told them earlier, because in a real sense, it did.


    The Takeaway

    A Large Language Model is the engine powering the AI revolution you’re living through right now.

    It’s a new hire who read the entire internet before day one. They predict the next word, one word at a time, with a confidence that makes it look like understanding. They went through reading (pre-training), job training (fine-tuning), and performance reviews (human feedback) to become the helpful assistant you chat with today.

    They’re extraordinary at sounding human. They’re terrible at knowing when they’re wrong. And they’re sitting in more of your apps than you probably realized.

    Under the hood, it’s the same loop you learned about in our How AI Actually Learns article. Predict, check, adjust, repeat. Just with trillions of dials instead of four chai settings.

    Coming Up

    Now you know what the engine is. But here’s a subtle truth: the same LLM can give you a brilliant answer or a useless one depending entirely on how you ask. That little box where you type your question? It has a name — the prompt — and the words you put in it are the steering wheel of the entire engine. Next, we’ll break down what a prompt actually is and why it matters more than most people realize.


    AI for Common Folks – Making AI understandable, one concept at a time.

    Subscribe now

    Leave a comment

  • OpenAI’s $20B Cerebras Deal, GPT-Rosalind, Robot Learns on Its Own

    OpenAI’s $20B Cerebras Deal, GPT-Rosalind, Robot Learns on Its Own

    Good morning, OpenAI just doubled down on Cerebras with a chip deal that could reach $30 billion, the company also launched an AI model designed specifically for biology and drug discovery, and a robotics startup showed a robot brain that can figure out tasks nobody ever taught it. Here’s what happened 👇


    1. OpenAI Doubles Its Cerebras Chip Deal to Over $20 Billion

    OpenAI has agreed to pay chip startup Cerebras more than $20 billion over the next three years for servers powered by Cerebras chips, according to The Information. That is double the $10 billion commitment the two companies announced in January. The deal also includes warrants that could give OpenAI up to a 10% equity stake in Cerebras as spending increases, plus $1 billion from OpenAI to help fund Cerebras data centers. Total spending over three years could reach $30 billion.

    Cerebras, which makes wafer-scale engine chips that compete with Nvidia’s GPUs, is preparing an IPO in the second quarter at a valuation of roughly $35 billion. OpenAI CEO Sam Altman is an early investor. The deal is the clearest signal yet that OpenAI is building a chip supply chain that does not depend entirely on Nvidia.

    Why it matters: Every AI company on Earth is fighting for access to the same pool of Nvidia chips. By locking in $20 billion or more with Cerebras, OpenAI is hedging that dependence and, through its equity stake, turning a supplier relationship into a strategic investment. If Cerebras succeeds, OpenAI owns a piece of the alternative chip ecosystem. If you use ChatGPT, the speed and cost of every answer you get is shaped by which chips are running it. We broke down foundation models, the brains that run on these chips, in our AI Explained series → What Are Foundation Models?

    Source: Reuters


    2. OpenAI Launches GPT-Rosalind, a Biology-Tuned AI for Drug Discovery

    OpenAI introduced GPT-Rosalind on Thursday, an AI model built specifically for life sciences research. Named after Rosalind Franklin, the scientist whose X-ray crystallography work was central to discovering DNA’s structure, the model is designed to help researchers with evidence synthesis, hypothesis generation, experimental planning, and other multi-step research tasks. It can query databases, read the latest scientific papers, suggest new experiments, and connect to over 50 scientific tools through a free Codex plugin.

    OpenAI said it is already working with Amgen, Moderna, and Thermo Fisher Scientific to apply GPT-Rosalind across their workflows. The model is available as a research preview through OpenAI’s trusted access deployment structure.

    Why it matters: Drug discovery typically takes over a decade and costs billions. If an AI model can meaningfully accelerate the early stages of research, even by months, the downstream impact on which drugs reach your pharmacy shelves is enormous. This is also OpenAI’s second specialized model in one week, after GPT-5.4-Cyber for cybersecurity. The company is clearly betting that the future of AI is not one model that does everything, but specialized models tuned for high-stakes fields.

    Source: Reuters


    3. This Robot Brain Can Figure Out Tasks Nobody Taught It

    Physical Intelligence, a San Francisco robotics startup valued at $5.6 billion, published research on Thursday showing that its latest model can direct robots to perform tasks they were never explicitly trained on. The model, called π0.7, demonstrated what researchers call “compositional generalization,” the ability to combine skills learned in different contexts to solve new problems. In one test, the robot figured out how to use an air fryer despite having only two barely relevant examples in its entire training dataset. With verbal coaching from a human walking it through the steps, it succeeded.

    The π0.7 model matched the performance of purpose-built specialist models across complex tasks including making coffee, folding laundry, and assembling boxes. The company is reportedly in talks to raise at an $11 billion valuation.

    Why it matters: Until now, training a robot meant collecting data on each specific task and building a model for that task alone. If robots can start remixing skills the way language models remix words, it changes the economics of automation entirely. A warehouse, a hospital, or a restaurant would not need a different robot for every job. They would need one that can be coached. We explained how AI systems learn from data, including the foundations that make this kind of generalization possible, in our AI Explained series → How AI Actually Learns

    Source: TechCrunch


    4. The White House Plans to Give Federal Agencies Access to Anthropic’s Mythos

    The U.S. government is preparing to make a version of Anthropic’s Mythos model available to major federal agencies, Bloomberg News reported. Gregory Barbaccia, the federal chief information officer, emailed Cabinet department officials on Tuesday that the Office of Management and Budget was setting up protections to allow agencies to begin using the model. “We’re working closely with model providers, other industry partners, and the intelligence community to ensure the appropriate guardrails and safeguards are in place,” Barbaccia said.

    Separately, Anthropic CEO Dario Amodei is scheduled to meet White House chief of staff Susie Wiles on Friday, Axios reported, signaling a possible breakthrough in Anthropic’s ongoing dispute with the Pentagon.

    Why it matters: The same model that five major financial regulators spent the past two weeks scrutinizing for cybersecurity risk is now being prepared for use by the very government agencies responsible for protecting critical infrastructure. That is not a contradiction. It is the same logic that drives every advanced weapons system: if something is this powerful, you want your own people to have it first. The Mythos saga is becoming the clearest real-world test case for how governments handle AI models that are simultaneously a defensive tool and a potential threat.

    Source: Reuters | Source: Reuters


    Quick Hits

    • AI traffic to US retail websites jumped 393% in Q1, and shoppers arriving via AI now convert 42% better than non-AI visitors, according to Adobe data. A year ago, AI traffic converted 38% worse. The turnaround is massive. Source: TechCrunch

    • Anthropic’s chief product officer left Figma’s board after reports that Anthropic plans to offer a competing design product. Source: TechCrunch

    • Mozilla launched Thunderbolt, a new AI client focused on self-hosted infrastructure, built on the open-source Haystack framework toward what it calls a “decentralized open source AI ecosystem.” Source: Ars Technica


    That’s it for today. OpenAI is spending like a company that believes compute will be the oil of the next decade, and it is not just buying chips but buying into the companies that make them. Meanwhile, the race to put AI into biology labs, robot arms, and government agencies is accelerating at a pace that makes last year’s “will AI be useful?” debate feel like ancient history.

    Forward this to someone who needs to stay in the loop.

    Subscribe now

    Leave a comment

  • What Does GPT Actually Stand For and How Does It Work?

    What Does GPT Actually Stand For and How Does It Work?

    GPT stands for Generative Pre-trained Transformer — a family of AI models built by OpenAI that powers ChatGPT and defined the modern era of AI.

    AI for Common Folks
    Apr 2026

    Hey Common Folks!

    In our last article on Foundation Models, we talked about the general-purpose brains that power modern AI — the Swiss Army Knives trained to do everything from writing code to drafting emails. Before that, we explored Generative AI, the broad category of AI that creates new content.

    Now let’s zoom in on the most famous Foundation Model family of them all: GPT.

    You see it everywhere. GPT-4, GPT-5, ChatGPT. But what do those three letters actually stand for? Is it a robot? A company? A magic spell?

    Here’s the real story: GPT is not just an acronym. It is three separate breakthroughs in AI that had never been combined at massive scale. OpenAI put them together, and that combination is why modern AI works.

    Let’s unpack each one.

    What is GPT?

    GPT stands for Generative Pre-trained Transformer.

    It is a specific type of Large Language Model (LLM) developed by OpenAI. If AI is the broad industry, GPT is a specific product line, like the “iPhone” of AI models.

    But here is the part nobody tells you: each of those three words (Generative, Pre-trained, Transformer) represents a problem that AI researchers had been stuck on for decades. GPT is the name for what happened when all three got solved at the same time.

    Before GPT: Three Problems AI Couldn’t Crack

    To understand why GPT matters, you have to understand what AI looked like before it existed.

    For most of AI’s history (roughly the 1950s through the 2010s), researchers were stuck on three problems simultaneously:

    1. AI could classify, but it couldn’t create. It could tell you if an email was spam, but it couldn’t write an email.

    2. AI had to be trained from scratch for every task. Want translation? Build a translation model. Want summarization? Build a summarization model. One model, one job, always starting from zero.

    3. AI could only read one word at a time. The dominant technology of the day (called RNNs and LSTMs) processed text sequentially, like reading a book strictly left to right. It was slow, and by the end of a long sentence, it had often forgotten the beginning.

    Every single letter in “GPT” was an answer to one of these problems. Let’s take them one by one.

    1. G is for Generative: The Shift from “Classify” to “Create”

    This is the easy part to say, but the hardest to appreciate.

    What it means: GPT can create new content. Essays, code, poetry, emails. It generates output that didn’t exist before.

    Why it’s a big deal: For decades, AI was a world of “yes/no” answers. Is this spam? Is this a cat or a dog? Does this customer churn? These are classification tasks. AI looks at something and puts it in a bucket.

    Creating something new from scratch (a paragraph, a story, a working function of code) was considered nearly impossible. Language is infinite. There are more possible sentences than atoms in the universe. How would an AI pick a good one?

    The Generative approach said: don’t pick the “right” sentence. Generate it word by word, always predicting the most likely next word given what came before. Do that billions of times, and coherent writing emerges.

    That sounds simple. It is also the shift that took AI from “recognizing patterns in data” to “creating patterns that look human.”

    2. P is for Pre-trained: The Free Labels Trick

    This one is the real genius, and most explanations skip it.

    What it means: Before GPT is ever asked to do anything useful, it has already read a massive amount of text. Books, Wikipedia, websites, articles, code. That’s the “pre” in pre-trained.

    Why it’s a big deal: Traditional AI needed labeled data. To teach AI to spot spam, humans had to label millions of emails as “spam” or “not spam.” To teach it to tell cats from dogs, humans had to label millions of photos. Labeled data is expensive, slow, and limited.

    Pre-training flipped the entire problem on its head with one insight:

    If the task is “predict the next word,” the internet is already labeled. The label is just the next word.

    Read “The cat sat on the ___” and the correct answer is whatever word came next in the original sentence. No humans needed. The data labels itself. And the internet has trillions of words.

    Suddenly, AI had unlimited training data. GPT-3 was trained on roughly 570 GB of filtered text, pulled from an even larger 45 TB of raw internet data. Later models like GPT-4 and GPT-5 used dramatically more. That scale would have been unimaginable with human-labeled data.

    Think of Pre-training as a student reading every book in the library to learn general knowledge. Later, this student can be Fine-tuned (specialized training for a specific job) to become a doctor, a coder, or a chatbot. But the broad education comes first, and it comes from the text itself.

    3. T is for Transformer: Seeing All the Words at Once

    What it means: The Transformer is a specific type of Neural Network architecture introduced by Google researchers in 2017, in a paper famously titled “Attention Is All You Need.”

    Why it’s a big deal: Before Transformers, AI read sentences one word at a time, sequentially. This was slow, and the model often forgot the beginning of a long sentence by the time it reached the end. It also meant you couldn’t spread the work across thousands of chips in parallel, which put a hard ceiling on how big these models could get.

    Transformers introduced two superpowers:

    1. Parallel Processing: They look at all the words in a sentence simultaneously, rather than one by one. This makes them dramatically faster and, critically, scalable to billions of parameters. Without Transformers, no amount of compute could have produced GPT-3 or GPT-4.

    2. Self-Attention: They figure out which words in a sentence relate to each other. In “The bank of the river,” the Transformer pays attention to “river” to know that “bank” means land. In “The bank approved my loan,” it pays attention to “loan” to know bank means the financial kind. Same word, different meaning, figured out from context.

    Self-attention is what gave AI something that looks like understanding context. It is the single architectural idea that made modern AI possible.

    Why the Combination Changed Everything

    Here’s the thing nobody emphasizes enough: each of these three ideas existed on its own before GPT.

    • Researchers had built generative models before.

    • Unsupervised pre-training had been explored in smaller forms.

    • The Transformer paper was published by Google, not OpenAI.

    What OpenAI did was combine all three at massive scale. GPT-1 in 2018 showed the recipe could work. GPT-2 in 2019 showed it could write coherently. GPT-3 in 2020 was the moment the world saw what happens when you push this recipe to billions of parameters: the model started doing things it was never explicitly trained to do. Reasoning. Translation. Summarization. Rudimentary code generation. Researchers call these emergent abilities. Capabilities that appear, seemingly out of nowhere, once the model gets big enough.

    ChatGPT in late 2022 was when the public caught on.

    So when someone says “GPT changed AI,” they are not being dramatic. The specific combination of Generative + Pre-trained + Transformer at scale is the recipe that broke a decades-long logjam.

    GPT vs. ChatGPT

    Are they the same thing? No.

    Here is the best analogy to understand the difference:

    Think of a Laptop.

    • GPT is the Processor (like Intel or Apple Silicon): It is the raw brainpower and technology that does the thinking.

    • ChatGPT is the Laptop (like a MacBook or Dell XPS): It is the product wrapped around that processor with a screen and keyboard (an interface) that allows you to interact with it easily.

    GPT is the model; ChatGPT is the application built using the GPT model.

    The “Decoder” Secret

    If you want to sound extra smart, know this: the Transformer architecture originally came with two parts, an Encoder (to understand input) and a Decoder (to generate output).

    GPT models are actually Decoder-only models. They dropped the Encoder entirely. They are specialists in generating text: predict the next token, then the next, then the next, until they have built a whole sentence.

    Different AI systems use different slices of the Transformer architecture. Google’s original BERT was Encoder-only (great for understanding and search). GPT is Decoder-only (great for generating). That single design choice is a big part of why GPT models feel so fluent when they write.

    The Takeaway

    You didn’t just learn what an acronym stands for. You learned the three ingredients that made modern AI possible:

    • Generative: AI stopped classifying and started creating.

    • Pre-trained: The internet itself became the training data, no humans needed to label it.

    • Transformer: AI stopped reading one word at a time and started seeing the whole picture at once.

    Each of these had been tried separately. Combining them at scale, between 2018 and 2020, is what OpenAI did. And it is the reason “GPT” became shorthand for modern AI.

    The next time someone says “we’re in the GPT era,” you’ll know they don’t mean an acronym. They mean a recipe.

    Coming Up

    You now know what GPT stands for. But here is a subtle point we glossed over: GPT is just one example of a broader category called Large Language Models (LLMs). Claude, Gemini, Llama, and DeepSeek are LLMs too. So what exactly is an LLM, and why is it the engine behind every chatbot you use? In our next article, we’ll break down the engine behind ChatGPT, Claude, and Gemini and show you why LLMs are the defining technology of this decade.


    AI for Common Folks – Making AI understandable, one concept at a time.

    Subscribe now

    Leave a comment

  • What Are Foundation Models and Why Does Everyone Talk About Them?

    What Are Foundation Models and Why Does Everyone Talk About Them?

    A Foundation Model is a massive AI model trained on a vast amount of data — text, images, code — that can be adapted to perform a wide range of tasks, from writing emails to diagnosing diseases.

    Hey Common Folks!

    In our last article, we explored Generative AI — AI that can create brand new content like essays, images, music, and code. We saw it transforming customer support, education, content creation, and software development.

    But here’s a question that article left open: what’s actually powering all of that?

    When ChatGPT writes your email, when Claude explains your kid’s homework, when Gemini summarizes a research paper — they’re all running on the same type of technology underneath. That technology is called a Foundation Model.

    The Old Way vs. The New Way

    In traditional AI (the old way), if you wanted to translate English to French, you built a “Translation Model.” If you wanted to summarize text, you built a “Summarization Model.” If you wanted to detect spam, you built a “Spam Model.”

    One model, one job. Every new task meant starting from scratch.

    Foundation Models changed the game completely. They are like a Swiss Army Knife — one single tool that can perform hundreds of different tasks, from writing code to composing poetry to analyzing legal contracts.

    The Analogy: The “Super-Student” in the Library

    Remember from our earlier article on What is a Model — we compared an AI model to a student who finished studying and walks into the exam with knowledge in their head?

    Now imagine two types of students:

    • Traditional AI (The Specialist): This student only studied one book: “How to Repair a Bicycle.” Ask them to fix a bike? Perfect. Ask them to write an essay on history? They have no idea. They are specialized.

    • Foundation Model (The Generalist): This student has read almost every book in the library. History, math, coding, poetry, mechanics, medicine. Because they’ve seen so much, they’ve learned general patterns about how the world works.

    Now, you can ask this “Super-Student” to fix a bike, or write a poem, or solve a math problem, or draft a legal brief. They have a broad “foundation” of knowledge that allows them to adapt to almost any request.

    That’s why they’re called Foundation Models — they serve as the base upon which everything else is built. Just like you lay a concrete foundation before building a house, a hospital, or a skyscraper.

    The Framework: Builders vs. Users

    Understanding Foundation Models becomes much easier when you split the world into two groups: the people who make the engine, and the people who drive the car.

    1. The Builder Perspective (Making the Brain)

    These are the engineers at companies like OpenAI, Anthropic, Google, or Meta. They take massive amounts of raw materials — text from the internet, books, code repositories, images — and use sophisticated processes to train these models.

    • The Goal: To create a model that learns general patterns about language, logic, and the world. It’s like feeding the “student” all those books.

    • The Result: A Foundation Model (like GPT-4o, Claude, Gemini, or Llama).

    2. The User Perspective (Putting the Brain to Work)

    This is where most businesses and developers sit. They don’t need to know how to build the brain from scratch. They just need to know how to use it.

    Think of the Foundation Model as a powerful Engine:

    • One user might take that engine and put it into a Race Car (a chatbot for customer service).

    • Another user might put it into a Truck (a tool to summarize legal documents).

    • Another might put it into a Boat (an app that generates marketing copy).

    • Another might put it into an Ambulance (an AI assistant helping doctors with diagnoses).

    You don’t need to know how to build the engine. You just use its capabilities to solve your specific problem.

    This is exactly what’s happening in the real world right now. Companies like Canva, Notion, Duolingo, and Khan Academy aren’t building their own Foundation Models — they’re plugging into existing ones (GPT, Claude, Gemini) and building products on top.

    Types of Foundation Models

    While Large Language Models (LLMs) are the most famous type, Foundation Models are expanding into new territories:

    1. Large Language Models (LLMs): Trained on text. Good for writing, summarizing, reasoning, and coding. Examples: GPT-4o, Claude, Llama, Gemini.

    2. Large Multimodal Models (LMMs): These can understand not just text, but also images, audio, and video. When you upload a photo to ChatGPT and ask “what’s in this image?” — that’s a multimodal model at work. Examples: GPT-4o (handles text + images + audio), Gemini (text + images + video).

    3. Image Generation Models: Trained on images paired with descriptions. They create visuals from text prompts. Examples: DALL-E 3, Midjourney, Stable Diffusion.

    4. Code Models: Specialized for understanding and generating software code. Examples: Claude (Anthropic’s model powers Claude Code), GitHub Copilot (powered by OpenAI models).

    All of these are Foundation Models — massive, general-purpose brains that can be adapted to specific jobs.

    Why This Matters for You

    You might be thinking: “Okay, but I’m not building AI. Why should I care about Foundation Models?”

    Three reasons:

    1. It explains why AI suddenly got good at everything. Before Foundation Models, AI was narrow — good at one thing, useless at the rest. Foundation Models are why your AI assistant can write an email and explain physics and debug code and plan your vacation. One brain, many skills.

    2. It’s why the AI race is so expensive. Training a Foundation Model costs tens of millions to hundreds of millions of dollars. That’s why only a handful of companies (OpenAI, Anthropic, Google, Meta) can afford to build them. Everyone else builds on top of them.

    3. It’s the reason AI will keep getting more useful. As Foundation Models get better, every product built on top of them gets better automatically. When GPT improves, every app using GPT improves. That’s the power of a shared foundation.

    The Takeaway

    A Foundation Model is a general-purpose AI brain:

    • It is the engine inside the car.

    • It is the student who read every book in the library.

    • It is the Swiss Army Knife of the digital world.

    It shifted AI from “specialized tools that do one thing” to “general intelligence that can be adapted to almost anything.”

    And here’s the exciting part: we’re still early. The Foundation Models of 2026 are dramatically more capable than those of 2023. The ones coming in 2027 and beyond will make today’s look primitive. Understanding this technology now means you won’t be caught off guard as it transforms more of the world around you.

    Coming Up

    Now that you understand what a Foundation Model is — this massive, general-purpose brain — you’ve probably noticed we keep mentioning one name more than any other: GPT. GPT-3, GPT-4, ChatGPT. But what do those three letters actually stand for? And why did this particular approach become the dominant one? In our next article, we’ll decode the most famous acronym in AI and break down exactly how GPT works, letter by letter.


    AI for Common Folks — Making AI understandable, one concept at a time.

    Subscribe now

    Leave a comment

  • Anthropic at $800B, GPT-5.4-Cyber Debuts, Uber’s $10B Robotaxi Bet

    Anthropic at $800B, GPT-5.4-Cyber Debuts, Uber’s $10B Robotaxi Bet

    Good morning, investors are lining up to value Anthropic at $800 billion while OpenAI’s own backers question its $852 billion price tag, OpenAI just launched a cybersecurity-focused model to answer Anthropic’s Mythos, and Uber committed $10 billion to robotaxis in its biggest strategic pivot ever. Here’s what happened 👇


    1. Anthropic’s Valuation Could Double to $800 Billion as OpenAI Faces Investor Doubt

    Venture capital firms have approached Anthropic with offers to invest at valuations as high as $800 billion, more than double the $380 billion it raised at in February. Anthropic has so far resisted these overtures, according to Bloomberg. The company’s run-rate revenue now surpasses $30 billion, up from roughly $9 billion at the end of 2025, driven by surging demand for Claude and the buzz around its frontier Mythos model.

    Meanwhile, the Financial Times reported that some of OpenAI’s own backers are questioning its $852 billion valuation as the company shifts its strategy toward the enterprise market to compete with Anthropic. The contrast is striking: one company’s investors are racing to get in. The other company’s investors are asking if they overpaid.

    Why it matters: Six months ago, OpenAI was the undisputed leader in AI. Now Anthropic is growing revenue at a pace that could close the gap faster than anyone expected. If you use ChatGPT or Claude at work, you are watching a competitive shift that will directly affect the tools, pricing, and features available to you.

    Source: Reuters | Source: Reuters


    2. OpenAI Launches GPT-5.4-Cyber to Counter Anthropic’s Mythos

    OpenAI unveiled GPT-5.4-Cyber, a variant of its latest flagship model fine-tuned specifically for defensive cybersecurity work. The release comes exactly one week after Anthropic announced Mythos, which has already found “thousands” of major vulnerabilities in operating systems, browsers, and other software. GPT-5.4-Cyber will initially be available only to vetted security vendors, organizations, and researchers through OpenAI’s expanded Trusted Access for Cyber (TAC) program.

    The highest-tier TAC users will get access to the model with fewer restrictions on sensitive cybersecurity tasks like vulnerability research and analysis. OpenAI is also opening the program to thousands of individual defenders and hundreds of security teams.

    Why it matters: The cybersecurity AI race is now a two-horse competition. Both OpenAI and Anthropic are building models specifically designed to find software vulnerabilities before attackers do. For companies, this means AI-powered security tools are about to get significantly more capable. For everyone else, it means the software you use every day is about to get tested by AI systems that can find flaws humans have missed for years.

    Source: Reuters


    3. Uber Commits $10 Billion to Robotaxis, Breaks Its Own Business Model

    Uber has committed more than $10 billion to buying thousands of autonomous vehicles and taking equity stakes in their developers, according to the Financial Times. This is a fundamental break from the “gig economy” model that built the company. Uber is positioning itself as a marketplace for multiple robotaxi operators, partnering with Baidu, Rivian, and Lucid, and plans to launch robotaxi services in at least 28 cities by 2028.

    The deals include roughly $2.5 billion in equity stakes and over $7.5 billion in fleet purchases over the next few years, contingent on partners hitting deployment milestones.

    Why it matters: The company that defined ride-sharing is betting its future on removing drivers entirely. If you use Uber, you could be hailing a driverless car within two years depending on your city. This is also a signal that the robotaxi market has crossed from “maybe someday” to “we need to own this now” for the biggest players in transportation.

    Source: Reuters


    4. Snap Cuts 1,000 Jobs, Says AI Now Writes 65% of Its Code

    Snapchat’s parent company is laying off about 1,000 employees, 16% of its full-time staff, and closing over 300 open positions. The company said AI is now generating more than 65% of new code at Snap, enabling it to operate with smaller teams. CEO Evan Spiegel expects the cuts to save more than $500 million in annualized expenses by the second half of 2026.

    Snap is not alone. More than 80 tech companies have cut roughly 71,440 jobs so far this year, according to Layoffs.fyi, as AI adoption accelerates across the industry.

    Why it matters: The stat to pay attention to is not the layoff count. It is the 65% number. When a major tech company publicly says AI is writing two-thirds of its new code, that is a signal about where software development is heading across every industry. The question is no longer whether AI will change the job market. It is how fast.

    Source: Reuters


    Quick Hits

    • Maine became the first US state to pass a moratorium on large data centers, freezing approvals for facilities requiring more than 20 megawatts of power until October 2027. Eleven other states are weighing similar legislation. Source: Reuters

    • Federal agencies are quietly sidestepping Trump’s ban on Anthropic to test its Mythos model. The Commerce Department’s Center for AI Standards is actively testing Mythos’ capabilities, and staff on at least three congressional committees have held or requested briefings. Source: Reuters

    • Jane Street signed a $6 billion AI cloud computing deal with CoreWeave and boosted its equity stake in the company, one of the largest single cloud contracts announced this year. Source: Reuters


    That’s it for today. The AI industry is splitting into two clear lanes: one where the biggest companies race to build the most powerful models, and another where everyone else figures out what those models mean for their workers, their cities, and their energy bills.

    Forward this to someone who needs to stay in the loop.

    Subscribe now

    Leave a comment

  • What Is Generative AI and How Does It Create New Things?

    What Is Generative AI and How Does It Create New Things?

    Generative AI is artificial intelligence that creates brand new content that didn’t exist before, like writing an essay, drawing a picture, composing music, or writing computer code, by learning patterns from mountains of existing data and producing something new that feels like a human made it.

    Hey Common Folks!

    We break down AI so it makes sense to real people. If you’ve been following along, our last article explored Predictive Modeling, how AI uses historical data to make educated guesses about the future. That’s AI looking backward to predict forward.

    Today we’re flipping the script. What if AI could look at everything it has learned and create something entirely new?

    That’s Generative AI. And it’s the reason AI went from a behind-the-scenes tool to something your parents, your boss, and your neighbor are all talking about.

    Why This Feels Different from Everything Before

    Think about what makes humans special. Our ability to create new things, right? For decades, people said this was the one thing AI would never be able to do.

    Before Generative AI, artificial intelligence was mainly used for prediction and classification:

    • Will this customer buy our product? (prediction)

    • Is this email spam or not? (classification)

    • Which movies should we recommend? (recommendation)

    Useful, but limited. Now, Generative AI can write a poem, design a logo, generate a 3D model, compose music, or write working software. The creative barrier has been broken.

    In fact, the images you see in this article and even the majority of this writing style was created with the help of AI tools. We’re using the very technology we’re explaining.

    Four Ways Generative AI Is Already Changing the World

    1. Customer Support That Doesn’t Make You Pull Your Hair Out

    Remember the last time you needed help and had to wade through an endless phone tree or wait hours for a response?

    Companies used to need huge call centers with dozens of employees handling problems. Expensive and frustrating for everyone.

    Now, AI-powered chatbots handle the first level of support. When you contact customer service for a food delivery app or your internet provider, your initial questions are likely answered by an AI that understands what you’re asking and gives helpful responses.

    The result? Companies reduce their support staff from 10 people to 2-3, while actually improving response times. The AI handles common questions. Human agents focus on the complex issues that really need their attention.

    2. Content Creation That’s Indistinguishable from Human Work

    “AI can’t be creative” was the mantra for years. That ship has sailed.

    Today, if you read an article online, you often cannot tell if a human or an AI wrote it. The quality has improved that dramatically.

    This extends beyond writing to image creation, video editing, music composition, and more. Artists and creators aren’t being replaced, but they’re increasingly working alongside AI tools that speed up their workflow.

    For everyday people, this means you can generate professional-quality content without years of specialized training. Need a business proposal? A birthday card design? A custom bedtime story for your kids? Generative AI can help create it in minutes.

    3. Education That Adapts to How You Learn

    Generative AI is transforming education by providing 24/7 personalized learning support. Stuck on a complex topic? AI can explain concepts in multiple ways until you understand.

    Tools like ChatGPT, Claude, and Gemini are already being used by students worldwide as study partners. Universities like Northeastern and the London School of Economics have integrated AI tutoring into their programs. Some of these tools use Socratic questioning, asking you guiding questions instead of handing you the answer, helping you actually think instead of just copy.

    This newsletter itself is an example. We use AI tools every day to research, draft, and refine explanations so that complex topics reach you in plain language.

    4. Software Development That’s Accessible to Almost Everyone

    Coding used to be a highly specialized skill requiring years of training. Generative AI has dramatically lowered the barriers.

    Today’s AI tools like Claude Code, Cursor, and GitHub Copilot can:

    • Write functional code from a simple description

    • Explain complex code in plain language

    • Debug and fix errors

    • Build entire applications with minimal guidance

    Tasks that once required a team of five programmers might now be accomplished by two or three with AI assistance. More importantly, people who never thought they could create software are now building tools to solve their own problems.

    The “no-code” and “low-code” movements are being supercharged by Generative AI, making software creation accessible to people without technical backgrounds.

    Is Generative AI Just Another Tech Bubble?

    Before investing your time in learning any technology, it’s smart to ask whether it has staying power. We evaluated Generative AI against five questions:

    1. Does it solve real-world problems?
    Yes. As we just saw, it’s already making a real difference in customer service, education, content creation, and software development. These aren’t trivial applications.

    2. Is it useful in everyday life?
    Yes. Unlike some technologies that only benefit specialized industries, Generative AI tools are immediately useful to almost everyone. Writing emails, learning new skills, creating content, understanding complex information. Daily value.

    3. Is it creating economic impact?
    Absolutely. Private AI investment hit $285 billion in the US alone in 2025, growing over 127% in a single year. Generative AI captures nearly half of all private AI funding globally. Major companies are restructuring entire divisions around AI capabilities.

    4. Is it creating new job opportunities?
    Yes. While there are legitimate concerns about job displacement, Generative AI is also creating entirely new career paths. The role of “AI Engineer” barely existed a few years ago. Now it’s one of the fastest-growing job categories.

    New roles are emerging in prompt engineering, AI ethics, AI training and education, and specialized AI application development across industries.

    5. Is it accessible to ordinary people?
    Yes. This might be the most important part. Unlike previous waves of technology that required specialized knowledge, today’s Generative AI tools are designed to be used through natural language.

    You don’t need to code or understand complex math. You simply talk to them in English, Hindi, Urdu, or whatever language you prefer. The technology is accessible to almost everyone, regardless of technical background.

    All five answered yes. Generative AI is following the trajectory of truly transformative technologies like the internet, not temporary hype cycles.

    Why This Matters Now

    The Generative AI shift isn’t coming. It’s already here. But we’re still in the early days, comparable to where the internet was in the mid-90s. The most significant impacts and opportunities are still ahead.

    By understanding these tools now, you position yourself to benefit from them rather than being caught off guard as they transform more aspects of work and life.

    The good news? These tools are designed to be intuitive. You don’t need to understand everything about how they work to start benefiting from them today.

    Coming Up

    Now that you know what Generative AI is and why it matters, the natural next question is: what’s actually powering these tools? In our next article, we’ll break down Foundation Models, the massive pre-trained systems that make ChatGPT, Claude, and Gemini possible. If you’ve ever wondered why some AI tools are smarter than others, that one’s for you.


    Inspired in part by CampusX’s Hindi-language AI education content.

    AI for Common Folks — Making AI understandable, one concept at a time.

    Subscribe now

    Leave a comment