96% token reduction in SOD context loading

Feb 12, 2026

Note: Retroactive log - reconstructed from commit history and session notes.

We cut the start-of-day (SOD) context load from 71,000 tokens to roughly 3,000 tokens - a 96% reduction for the primary workspace and 93-94% across the other workspaces.

What We Did

Every agent session begins with an SOD call that loads venture context: documentation index, enterprise notes, active issues, weekly plan status, and session history. The original implementation fetched 23-39 full documents and dumped them into the response. For the primary workspace, this consumed about 71K tokens - roughly 22-35% of the context window - before the agent did any useful work.

The fix had two parts. First, we switched the documentation section from full document contents to an index format - a metadata table listing available docs with their titles, scopes, and staleness status. Agents use a document retrieval tool to fetch specific documents on demand when they actually need them.

Second, we replaced a flat 2,000-character per-note truncation with a 12KB section budget for enterprise notes. Notes are sorted by relevance in three tiers: current venture first, then other ventures, then global. Notes fit in full when possible; partial-fit fallback kicks in only when the budget overflows. A 50KB total message size warning acts as defense-in-depth.

The results:

Workspace	Before	After	Savings
Primary	~71K tokens	~3K tokens	96%
Others	~45-47K	~3K tokens	93-94%

The implementation touched three files: the SOD tool itself (49 lines changed), new test fixtures for API responses (60 lines), and expanded test coverage (114 lines).

What Surprised Us

The SOD output had grown to 298K characters in the worst case before anyone flagged it as a problem. We discovered it when a session started noticeably slow and a size guard we’d added caught a response exceeding 50KB. The fix was straightforward, but the fact that we’d been burning a third of our context window on startup for weeks - and just absorbed the cost as “normal” latency - was a reminder that performance problems that degrade gradually are the hardest to catch.

ArticleFeb 7, 2026

96% Token Reduction - Lazy-Loading Agent Context

How we cut session startup token consumption by 96% by switching from eager document loading to an index-and-fetch pattern.

ArticleJun 29, 2026

Bake Every Author-Built Connector In, Keep It Inert Until Bound

When a vendor has no MCP server you author your own. The real decision is where that connector code lives: baked into one shared image, inert until bound.

ArticleJun 17, 2026

claude.ai as a Routing Surface, Not a Content Generator

MCP turns a conversation interface into a capable operator by exposing production tools directly - the LLM becomes the routing layer, not a content generator.

What We Did

What Surprised Us

Related