Foundations 5 min
Tokens, models, and what it actually costs
How AI is priced, where your money goes, and how to cap your spend before it surprises you.
AI assistants are not free. Chat apps come with monthly subscriptions; APIs and agents charge per use. Three concepts let you predict and control your bill.
Tokens — what you're actually paying for
Models don't think in words; they think in tokens. A token is roughly a word or a chunk of one. "Hello" is one token. "Onomatopoeia" is four. English text is ~0.75 words per token. Code averages ~3 characters per token.
Every request charges you for two things:
- Input tokens. Your prompt + any history + system prompt + files.
- Output tokens. The reply the model writes back.
Output is usually 4–5× more expensive than input per token.
The three pricing tiers
| Tier | Use for | Rough cost per 1M tokens (in / out) |
|---|---|---|
| Cheap & fast Haiku, GPT-mini, Gemini Flash | Routine work, classification, simple lookups, summarisation | $0.25–1 / $1–4 |
| Balanced Sonnet, GPT, Gemini Pro | Default for coding CLIs, agents, writing | $3–5 / $15–20 |
| Heavy reasoning Opus, GPT-large, Gemini Ultra | Architecture, research, gnarly debugging | $15 / $75 |
Pricing moves fast — these are 2026 ballparks. Check each provider's pricing page for current numbers.
What workflows actually cost
One chat reply
~500 tokens in, ~500 out
≈ $0.001–0.01
A coding-CLI session
30 min, lots of file reads
≈ $0.50–3
Cloud agent (one PR)
Reads repo, edits files, runs tests
≈ $1–10
Background daemon (1 day)
Scheduled tasks every hour
≈ $5–50
Three ways to spend less
- Prompt caching. If you send the same context (system prompt, project files) repeatedly, modern APIs cache it — sometimes 80%+ off on repeated input. Coding CLIs do this automatically; verify by checking your usage dashboard.
- Model routing. Use the cheap tier for routine work, escalate to balanced for hard problems, reserve heavy reasoning for the rare gnarly thing. A good tool routes for you; otherwise pick deliberately.
- Hard spend caps. Every provider's dashboard lets you set a monthly limit. Set one before you start. Treat it as a smoke alarm.
Where to check your spend
| Provider | Where |
|---|---|
| Anthropic | console.anthropic.com → Usage |
| OpenAI | platform.openai.com → Usage |
| aistudio.google.com → Billing | |
| OpenRouter | openrouter.ai/activity — per model, per request |