What Are AI Tokens? How GPT-4o, Gemini, and Claude Count Them

You paste a prompt into ChatGPT or send a request to the OpenAI API, and somewhere behind the scenes a number gets calculated: how many tokens is this? That number determines whether your request succeeds, how fast the response comes back, and exactly how much you’re charged.

Most people ignore tokens until something breaks - a 400 error saying the context length was exceeded, or an API bill that’s higher than expected. This post explains what tokens actually are, how different models count them differently, and what you need to know to avoid those problems.

Use the AI Token Counter to check your text against GPT-4o or any Gemini model without leaving your browser.

What Is a Token?

A token is the basic unit of text that a language model processes. It’s not a word, and it’s not a character - it’s something in between, determined by the model’s tokenizer.

A tokenizer splits text into chunks using a vocabulary of tens of thousands of entries built during training. Common words become single tokens. Rare words get split into multiple tokens. Punctuation, spaces, and numbers have their own rules.

Some examples using GPT-4o’s tokenizer:

Text	Tokens
`hello`	1
`tokenization`	3 (`token`, `ization` - wait, actually 1 for this common word)
`ChatGPT`	2
`https://example.com/path`	7
`{"key": "value"}`	6

The rough rule of thumb: 1 token ≈ 4 characters, or about ¾ of a word in English. So 1,000 words is approximately 1,300–1,500 tokens. But this varies significantly - code tokenizes to more tokens per line, and non-Latin languages (Chinese, Arabic, Japanese) can use 2–3× more tokens per character than English.

Why Tokens Matter

Every AI API interaction has two token-related limits you need to understand:

1. Context Window

The context window is the maximum number of tokens a model can process in a single request - including your prompt, any conversation history, and the model’s response. If you exceed it, the API returns an error.

Model	Context Window
GPT-4o	128,000 tokens
GPT-4o mini	128,000 tokens
Gemini 3.1 Pro	1,000,000 tokens
Gemini 3 Flash	1,000,000 tokens
Gemini 2.5 Pro	1,000,000 tokens

A 128k context window can hold roughly 100,000 words - about the length of a novel. Gemini’s 1M context window holds roughly 750,000 words. In practice, you’re more likely to hit limits from long conversation histories or large documents being passed as context.

2. Cost

APIs charge per token - typically separately for input (your prompt) and output (the model’s response). Output tokens are usually 3–5× more expensive than input tokens.

Model	Input (per 1M tokens)
GPT-4o	$2.50
GPT-4o mini	$0.15
Gemini 3.1 Pro	$2.00
Gemini 3 Flash	$0.50
Gemini 2.5 Pro	$1.25

Knowing your token count before running a batch job or building a system prompt can prevent bill surprises.

Why GPT-4o and Gemini Count Tokens Differently

Different models use different tokenizers, which means the same text can produce a different token count depending on which model you’re using. This isn’t just a technical detail - it affects both your context window budget and your costs.

GPT-4o uses the o200k_base tokenizer - a vocabulary of 200,000 tokens optimized for modern text, code, and multilingual content. It’s more efficient than the older cl100k_base used by GPT-4 and GPT-3.5.

Gemini models use Google’s SentencePiece tokenizer, which is not publicly available as a standalone library. For most English text, the counts are close to GPT-4o’s - usually within 5–10%. For code or non-Latin scripts, the gap can be larger.

Claude (Anthropic) uses a proprietary BPE tokenizer that’s also not released publicly. Similar story: close to GPT-4o for English prose, may diverge on code and other languages.

This is why most token counter tools - including ours - use GPT-4o’s tokenizer as a universal approximation, and offer exact counts via the model’s own API for cases where precision matters.

”This model’s maximum context length is X tokens”

You’ve exceeded the context window. The fix is to reduce your input: shorten your system prompt, summarize older conversation history, or split the task into smaller chunks.

Unexpectedly high API costs

A few common causes:

Long system prompts sent on every request add up fast in high-volume applications
Returning large structured outputs (JSON with many fields) uses more output tokens than plain prose
Storing full conversation history - the entire history is resent on every turn, so a 20-message conversation costs significantly more per turn than a 2-message one

Truncated responses

If the model stops mid-sentence, you’ve likely hit the max_tokens limit on the response side, not the context window. Increase max_tokens or break the task into smaller pieces.

How to Check Your Token Count

The fastest way: paste your text into the AI Token Counter and select your model. GPT-4o counts are instant and exact in the browser. For Gemini, click Get token count to fetch the precise number via Google’s API.

This is useful before:

Sending a long document as context
Deploying a system prompt you’ve been iterating on
Running a batch job where per-request cost matters
Building a RAG pipeline where chunk size affects retrieval quality

Quick Reference

1,000 tokens ≈ 750 words (English prose)
1,000 tokens ≈ 500–700 words (code)
Context window = max tokens for the entire request (prompt + history + response)
Exceeding context window → API error, not truncation
GPT-4o tokenizer (o200k_base) is the best universal approximation for other models
Gemini and Claude have proprietary tokenizers - use their APIs for exact counts