Over the last couple of years I've built and shipped a small pile of AI products. A suite of consumer apps β study help, a legal advisor, a content creator, a farming assistant in Hindi and Punjabi β plus a phone-based assistant for a company that manages tens of thousands of buildings, and a Slack bot that keeps my team in sync.
Along the way I kept bumping into the same handful of concepts. They have intimidating names and most explanations make them sound harder than they are. So here's my version: each idea, the story of the app that taught it to me, and what it actually means β no jargon, I promise.
1. Sometimes you want a robot. Sometimes you want a poet.
I built a feature in my legal app that pulls the exact section of a law and explains it. It has to be exact. No creativity, no "interpretation," no surprises β just the right section, the same way, every time.
Then I built a content tool that writes social posts and marketing copy. That one needs the opposite: personality, flair, word choices you didn't see coming.
Same underlying model. Opposite needs. And early on, the legal one kept "getting creative" with the wording while the content one kept writing safe, boring copy. The fix was one single dial: temperature.
What temperature actually does
Low temperature β the robot
Predictable, repeatable, plays it safe. Perfect for quoting a law, pulling a number off an invoice, or anything where the same input should always give the same output.
High temperature β the poet
Varied, surprising, willing to take a swing. Perfect for marketing copy, brainstorming, captions β anywhere a fresh answer beats the obvious one.
So, what is temperature? It's a single setting that controls how safe versus creative an AI's answers are. Turn it down and the model becomes a reliable robot. Turn it up and it becomes a poet. Most people never touch it β and then wonder why their app feels either dull or unhinged.
2. The agent has the memory of a goldfish
My lecture app takes a 90-minute class and turns it into clean notes, a summary, and exam-style questions. The first time I tried it, I did the obvious thing: shove the entire transcript at the model and ask for notes.
It choked. Long recordings simply didn't fit, and when they almost fit, the model would "forget" the beginning of the lecture by the time it reached the end β like a student who slept through the first hour.
So, what is a context window? It's the model's short-term memory β the total amount of text it can hold in its head at one time. Go past it and the earliest stuff falls out the back. The real skill isn't dumping everything in; it's deciding what the model actually needs to see for the question at hand. For a long lecture that meant breaking it into chunks, summarising each, then summarising the summaries.
3. You're not paying per question. You're paying per word.
The first month one of my consumer apps got popular, I felt great β right up until I saw the bill. AI isn't like normal software, where one more user costs almost nothing. Every single answer costs real money, and the meter runs on tokens.
So, what is a token? Roughly, a chunk of a word β the unit a model reads and writes in. "Hello" is one token; a long word might be two or three. And here's the part that bites you: you pay for everything you send in (the user's question, the document, your instructions) and everything the model sends back. A chatty assistant answering a long question is quietly expensive.
This is why my apps run on yearly plans with sensible limits, and why I instrument cost per request from day one. Going "viral" while every free user racks up an unbounded bill isn't a success story β it's an invoice you can't pay.
The mental model that saved me money
What I assumed
"One question = one fixed cost." So I let prompts and answers sprawl, padded them with extra instructions, and never measured.
What's actually true
You pay for every word in and every word out. Tighter prompts, shorter answers and hard limits aren't stinginess β they're the business model.
4. Don't make it memorise. Give it the textbook.
The hardest thing I've built is a voice assistant that takes work orders over the phone. A caller half-remembers a building's name on a noisy line, and the agent has to pin down the right one out of tens of thousands. My first version asked the model to just "know" the answer. It confidently picked the wrong building constantly.
The fix wasn't a smarter model. It was giving the model the textbook. Before it answers, I search a proper index for the buildings that actually match, hand the model just those few, and ask it to choose from them. Accuracy shot up overnight.
So, what is RAG? It's an ugly acronym (retrieval-augmented generation) for a simple idea: look it up first, then answer. Instead of trusting the model's memory, you fetch the relevant facts from a source you trust and feed them in. It's the difference between an open-book exam and a from-memory one β and it's the single biggest weapon against made-up answers.
5. The job description you write before anyone says a word
The same model powers my farming assistant and my legal advisor, but they behave like completely different people. The farming one talks like a helpful neighbour, in simple Hindi or Punjabi, patient with someone who's never used an app. The legal one is precise, careful, and never guesses about the law.
I didn't fine-tune two different models for that. I just wrote two different system prompts.
So, what is a system prompt? It's the standing instruction you give the model before the user ever speaks β its job description, its tone, its rules, the things it must never do. Get it right and the same engine becomes a calm teacher, a cautious lawyer, or a punchy copywriter. Get it vague and you get a generic chatbot that sounds like every other one.
6. When you want an answer, not an essay
I built a tool that reads a messy PDF invoice and pulls out the numbers β vendor, date, total, line items. The trouble is, models love to chat. Ask for the total and you'd get "Sure! The total on this invoice appears to be βΉ12,400, though you may want to double-checkβ¦" β lovely for a human, useless for code that just needs a number.
So, what is structured output? It's telling the model to reply in a strict, machine-readable shape β usually JSON, basically a tidy labelled box β instead of prose. "Give me only this: vendor, date, total, items." Now my code gets clean fields it can drop straight into a database, with no fishing the number out of a paragraph. The moment an AI feature has to talk to the rest of your software, this is the concept that makes it reliable.
7. Teaching it to press buttons, not just talk
My team's Slack bot can answer "what did everyone ship last week?" That sounds like a chat question, but the model can't answer it from thin air β the information lives in a database, not in its training. So I let it use tools: when it sees that question, it doesn't make something up, it calls a function that actually queries our data, gets real numbers back, and then explains them.
So, what is function calling (or "tool use")? It's giving the model a set of buttons it's allowed to press β look up the weather, search a database, send a message β and letting it decide when to press them. This is the leap from a chatbot that talks to an agent that does things. My farming app uses it to fetch live weather; the Slack bot uses it to pull real activity. The model stays the brain; the tools are its hands.
8. It will lie to you with a completely straight face
There's a specific kind of dread in watching an AI tell a paying user something that's flatly wrong β with total confidence. In a legal app, that's not just embarrassing, it's dangerous. Early on I treated this as a "later" problem. By the time I noticed I needed safeguards, the agent had already said things it shouldn't have.
So, what is hallucination? A model is built to produce text that sounds right, not to know when it's actually wrong. So sometimes it just⦠makes things up, fluently. There's no built-in alarm bell.
The answer isn't a magic setting β it's guardrails you build yourself: give it real sources to quote from (back to retrieval), check what it's allowed to claim, and let it say "I'm not sure β let me get you a human" instead of inventing an answer. Your users can't tell confidence from correctness. Your guardrails have to.
The whole thing in one breath
If you remember nothing else: temperature sets robot-vs-poet. Tokens are what you pay for, in and out. The context window is its short-term memory. RAG means look it up before you answer. The system prompt is its job description. Structured output is for when code needs an answer, not an essay. Function calling turns a talker into a doer. And hallucination is why you build guardrails before you need them.
None of this requires a PhD. It requires building something real, watching it fail in an interesting new way, and learning the name of the thing that just bit you. Eight ideas. That's the whole vocabulary behind every AI agent I've shipped.
Frequently Asked Questions
What does temperature mean in an AI model?
What is a token and why does it cost money?
What is a context window?
What is RAG (retrieval-augmented generation)?
What is a system prompt?
Why do AI models hallucinate, and what can you do about it?
Built by people who ship AI, not just talk about it
Every concept here came from a real app in real users' hands β a study tool, a legal advisor, a farming assistant, a content creator and more, all powered by the same eight ideas. Take a look at what we've built.
β‘ Explore Advanced AI Apps