Tokens & Context Windows
Learn about tokens, context windows, and why they determine what AI can and cannot do.
AI models don't read words — they read tokens. A token is a chunk of text, typically 3-4 characters. The word "hamburger" becomes three tokens: "ham", "bur", "ger". The word "the" is one token. Understanding tokens matters because everything in AI is measured in them — cost, speed, and the amount of text the model can process.
- English averages about 1 token per 0.75 words (or ~4 characters per token)
- A typical page of text is about 300-400 tokens
- Code is more token-dense — a line of code may use 10-20 tokens
- Non-English languages often use more tokens per word
The context window is the total amount of text the model can "see" at once — your prompt plus its response. Think of it as the model's working memory. Once you exceed the context window, the model literally cannot see the earlier parts of your conversation.
Prompt
Context window sizes vary dramatically across models:
GPT-4o
128K tokens (~96,000 words — a full novel)
Claude 3.5 Sonnet
200K tokens (~150,000 words)
Gemini 1.5 Pro
1M tokens (~750,000 words)
Every token costs money when using AI APIs, and longer prompts take longer to process. Being concise isn't just about clarity — it's about cost and speed. A prompt that uses 500 tokens instead of 2,000 is 4x cheaper and noticeably faster.
- Put the most important information at the beginning and end of your prompt
- Remove filler words and redundant instructions
- For long documents, summarize or extract key sections before sending to the model
- Track your token usage — most API dashboards show this
Prompt Templates
Document Summarizer (Token-Efficient)
Extracts key information while keeping token usage low.
Summarize this document in under 200 words, focusing on: [SPECIFIC ASPECT]. Use bullet points for key facts. Skip background information I already know. Document: [PASTE TEXT]
Long Document Analyzer
Efficiently analyzes specific parts of long documents.
I'm going to give you a long document. Focus on these sections specifically: 1. [SECTION/TOPIC 1] 2. [SECTION/TOPIC 2] For each, extract: the main claim, supporting evidence, and any caveats. Ignore everything else. Document: [PASTE TEXT]
Test Your Knowledge
Knowledge Check
1 / 2
Approximately how many tokens does the average English word use?
Key Takeaways
- ✓Tokens are the fundamental unit of AI text processing — typically 3-4 characters each
- ✓The context window is the model's total working memory for your conversation
- ✓Place critical information at the beginning and end of long prompts
- ✓Concise prompts are cheaper, faster, and often more effective
- ✓Different models have vastly different context window sizes
Continue Learning
What Are LLMs
A plain-English explanation of large language models and why they behave the way they do.
Temperature & Sampling
Understand temperature, top-p, and other settings that control how creative or deterministic AI outputs are.
Understanding Model Capabilities
What AI models can and cannot do, and how to choose the right model for your task.