Got the code? Welcome, builder!
You're heading into Invent the Future — instructors hand out the access code on day one.
You're heading into Invent the Future — instructors hand out the access code on day one.
An LLM (Large Language Model) is a program trained on an enormous amount of text — roughly the public internet plus a large corpus of books — to predict what comes next in any piece of writing.
Fill in the blank:
"The cat sat on the ___."
You said "mat." The model does this billions of times, with far more context, predicting the most plausible next chunk of text given everything that came before it.
The model doesn't predict word-by-word. It predicts tokens. A token is a small chunk of text, usually one word, sometimes part of one. "Running" might be one token or two: "run" + "ning." On average, roughly one word per token.
The model writes its response one token at a time — picks a token, picks the next based on everything so far, until it decides to stop. That's the whole engine.
Good at: summarizing long text, rewriting something in a different tone or reading level, brainstorming, explaining concepts at any depth, writing and debugging code.
Bad at: exact facts (dates, statistics, recent events), anything after the training cutoff, math beyond simple arithmetic without careful prompting.
A hallucination is when a model confidently states something false.
It's not lying — the model doesn't know it's wrong. It's predicting a plausible-sounding answer from patterns in its training data, and sometimes those patterns point the wrong direction.
Ask a model who won the women's 100m sprint at the 2028 Olympics. That event hasn't happened. The model might refuse, admit uncertainty, or confidently invent a name. All three are possible. The confident wrong answer is the one that bites you when you're not paying attention. For anything factual, verify the output independently.
The same prompt can produce different outputs each time. The model samples from a probability distribution of likely next tokens, so identical inputs often produce non-identical results. This is by design.
Run the same prompt twice in a new conversation. Sometimes both answers are good. Sometimes one is noticeably better. Re-running is always worth trying before you conclude a prompt doesn't work.
Writing prompts that reliably get you the output you want is called prompt engineering.
The model didn't get smarter between a vague prompt and a specific one. The prompt got more specific. That's the whole gap.
"Write a story." → unpredictable, probably generic
"Write a 4-paragraph story for a 12-year-old reader. The protagonist gets bullied but finds an unexpected way to stand up for themselves without fighting. Include one moment where they doubt themselves. End hopeful but not corny." → something you can actually work with
The rest is learning which techniques to apply when the default output isn't what you need.