Managing the Human Context Window
When I was a kid, my dad used to joke that he had a great memory — it was just short. The line landed because it was honest, and it added levity to those "I told you about that yesterday" type conversations.
I've thought about that line a lot lately, because working with AI has always been about managing the context window, but it has evolved significantly even in the last few months. As context windows grow, it's less and less about managing how the LLM handles context and more about managing the human context window.
A Quick Primer on the AI Context Window
When a large language model "reads" something — your prompt, an attached document, the previous turns of a conversation — that text gets chopped into tokens. A token is a fragment of language, usually somewhere between a syllable and a word. The rule of thumb most builders use is that 100 tokens equals roughly 75 words, or about three-quarters of a paragraph. So when you hear someone talk about a "200,000-token context window," translate that to roughly 150,000 words — about two full novels.
The context window is how many of those tokens the model can hold in active working memory at one time. Anything that fits inside the window can influence the response. Anything that doesn't is, for that turn, forgotten.
The progression over the last few years has been staggering. GPT-3, when it launched in 2020, had a 2,048-token context window — call it three pages of text. GPT-3.5 doubled that to 4,096. GPT-4 stretched to 8K, then 32K, then 128K. Claude pushed to 100K, then 200K, then a 1M-token tier for select workloads. Gemini 1.5 Pro shipped with a 1M-token window and later expanded to 2M. Research and frontier systems have now demonstrated context windows of 10 million tokens — roughly 7.5 million words, or about ninety novels held in mind simultaneously.
In the span of a few years, the AI's "memory" went from a few pages to an entire library.
We Solved the Model's Context Problem. We Didn't Solve Ours.
The early days of generative AI were defined by working around the context window. Prompts had to be terse. Documents had to be chunked. Conversations had to be summarized before they could continue. Builders developed elaborate retrieval systems just to feed the model enough information to be useful.
That bottleneck is essentially gone for most practical work. Modern systems can hold an entire codebase, a year of email, or a full client portfolio in a single window. And when they can't, sub-agents do the work for them. A sub-agent is exactly what it sounds like: a smaller, specialized AI process that gets sent to do focused work — read a document, search a folder, draft a section — and reports back a clean summary. The orchestrating model never has to load the raw material into its own context. It just receives the distilled answer.
While agents have been shifting the way that they do work to handle this context window, we humans have not changed the way we do work dramatically. This means a big shift is taking place for all of us.
The Human Context Window is Small — and AI Keeps Filling It
Humans have a context window, too. Cognitive scientists peg working memory at roughly four "chunks" of information at any given moment — far less than what the LLMs can manage. We compensate with notes, calendars, sticky tabs, and the patient kindness of the people around us. We are remarkably effective despite this constraint.
But here's the new problem: AI now generates content faster than any human can meaningfully process it. A model can produce twenty pages of strategy in the time it takes to refill a coffee. A sub-agent can quietly read and distill 100's of pages of research in seconds. The volume isn't the issue — the volume is the gift. The issue is that somebody still has to understand what was produced, because words matter and details matter, and at the end of the day, a human is going to be responsible for it.
At Five Q, "human-first" isn't a slogan about workflow ergonomics. It's a commitment that every piece of content has a human who owns it — not in the legal sense, but in the deeper sense of "I have read this, I stand behind this, I know what every paragraph is doing." Whether a draft was hand-written or AI-generated is irrelevant to that commitment. Ownership doesn't transfer just because a machine helped.
That commitment runs straight into the context window problem. If AI is producing more content than a human can actually absorb, the content starts to pile up, and all the value you get from AI's speed is for naught. To solve this, working the way we always have just won't cut it.
A Working Tactic: Invert the Format
One of the practices we've been leaning on at Five Q is inverting the order of how AI-assisted work happens.
The default workflow is to use AI to write the long-form content first — the report, the proposal, the policy document — and then ask a human to review it. That's the shape that comes most naturally, because long-form is what AI is so impressive at producing, and AI can create the first draft of these documents as fast as we can think of them. But it's the wrong shape for human comprehension. You hand someone twenty pages and ask them to spot what's wrong, what's missing, what's off-brand, and you've designed a process that quietly rewards skim-and-approve.
The shape that fits a human context window better is the opposite. Produce the simplified version first — the outline, the bullet structure, the one-paragraph thesis, the decision tree. Let the human chew on that until the shape and the substance are right. Only then expand into the long-form draft. And when reviewing the long-form, do it through Q&A rather than re-reading. Ask specific questions: "Why did we phrase the recommendation this way?" "What evidence backs the third paragraph?" "Where does this differ from what we agreed on in the outline?" Q&A is how a small context window interrogates a large one. It is, not coincidentally, the same trick the orchestrator model uses with its sub-agents.
The discipline isn't about producing less. It's about setting up the human as the arcitect and owner of everything that is created.
The Shift This Represents
Every era of computing has had its scarce resource. In the mainframe years, it was machine time. In the PC era, it was disk space, then memory, then bandwidth. In the early AI era, it was tokens. Each generation of tooling existed to help us route around the constraints of the moment.
The constraint of this moment is human attention and comprehension. It's the only context window left that hasn't gotten dramatically bigger. And unlike every previous bottleneck, we can't brute force our way through it by adding more RAM. What we can do is design our workflows around the fact that good leaders have always designed around their team's actual capacity rather than their imagined one.
The future is not adding AI into our work; it's shaping the way we do work to enable AI to do what it does best.
Josh Kashorek is in charge of AI Operations at Five Q, a trusted digital agency that delivers mission-driven growth for faith-based nonprofits. Connect with him on LinkedIn!
Works Cited
Anthropic. "Context Windows." Claude API Documentation, platform.claude.com/docs/en/build-with-claude/context-windows.
"Anthropic Makes a Pricing Change That Matters for Claude's Longest Prompts." The New Stack, thenewstack.io/claude-million-token-pricing/.
Cowan, Nelson. "The Magical Number 4 in Short-Term Memory: A Reconsideration of Mental Storage Capacity." Behavioral and Brain Sciences, vol. 24, no. 1, 2001, pp. 87–114. PubMed, pubmed.ncbi.nlm.nih.gov/11515286/.
"Gemini 1.5 Pro 2M Context Window, Code Execution Capabilities, and Gemma 2 Are Available Today." Google Developers Blog, Google, developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/.
"Gemini 3 Pro." Vertex AI Documentation, Google Cloud, docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-pro.
"Gemini 3.1 Pro." Vertex AI Documentation, Google Cloud, docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-1-pro.
Google. "Gemini Drops: New Updates to the Gemini App, April 2026." The Keyword, Apr. 2026, blog.google/innovation-and-ai/products/gemini-app/gemini-drop-april-2026/.
Google. "Introducing Gemini 1.5, Google's Next-Generation AI Model." The Keyword, Feb. 2024, blog.google/innovation-and-ai/products/google-gemini-next-generation-model-february-2024/.
"GPT-3." Wikipedia, Wikimedia Foundation, en.wikipedia.org/wiki/GPT-3.
"GPT-3.5 vs. GPT-4: Biggest Differences to Consider." TechTarget, techtarget.com/searchenterpriseai/tip/GPT-35-vs-GPT-4-Biggest-differences-to-consider.
"GPT-4 Turbo Preview: Exploring the 128K Context Window." Povio, povio.com/blog/gpt-4-turbo-preview-exploring-the-128k-context-window.
"New Llama 4 AI Model 10 Million Token Context Window." Geeky Gadgets, geeky-gadgets.com/llama-4-ai-model-long-context-window/.
OpenAI. "New Models and Developer Products Announced at DevDay." OpenAI, Nov. 2023, openai.com/index/new-models-and-developer-products-announced-at-devday/.
OpenAI. "What Are Tokens and How to Count Them?" OpenAI Help Center, help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them.