Cloudflare Makes Edge Agents Easier

Feb 17, 2026

Cloudflare shipped a single changelog entry on February 13 that quietly lowered the barrier to running AI agents at the edge. Three things landed at once: a new model, a new framework integration, and a provider upgrade. Individually they’re incremental. Together they close gaps that previously required glue code or third-party workarounds.

What Dropped

GLM-4.7-Flash is now available on Workers AI. It’s a multilingual text generation model from Zhipu AI with a 131,072-token context window and multi-turn tool calling. That last part matters - tool calling is what turns a language model into an agent. You can bind it directly to a Worker, hit it through the REST API, or route it through AI Gateway. No external inference provider needed.

@cloudflare/tanstack-ai (v0.1.1) is a new first-party package that wires Workers AI into TanStack’s AI primitives. It covers chat with streaming, image generation, transcription, text-to-speech, and summarization. It also ships AI Gateway adapters, which means you can route third-party providers - OpenAI, Anthropic, Gemini, Grok, OpenRouter - through Cloudflare’s infrastructure for caching and unified billing. If you’ve been using the Vercel AI SDK because TanStack didn’t have a Cloudflare story, you now have a framework-agnostic alternative.

workers-ai-provider v3.1.1 fills in the Vercel AI SDK side. Three new capabilities: transcription (speech-to-text via Whisper and Deepgram Nova-3), text-to-speech (Deepgram Aura-1), and document reranking (BGE models for RAG pipelines). The release also fixes streaming to respond token-by-token instead of buffering all chunks - a real usability issue for anything interactive.

Why It Matters

We run our backend stack on Cloudflare Workers. The pattern we’ve settled on - Workers for compute, D1 for storage, KV for caching, R2 for objects - keeps everything at the edge with no cold-start VM to manage. What’s been missing is a clean on-platform path for AI capabilities beyond basic inference.

This changelog starts filling that gap. A model with tool calling means you can build agentic loops without calling out to external APIs. Transcription and TTS mean voice workflows stay on-platform. Reranking means RAG pipelines don’t need a separate service. And streaming that actually works token-by-token means the user experience isn’t a wall of text appearing after a long pause.

None of this replaces the heavy models - Claude, GPT-4, etc. - for complex reasoning. But for the agent scaffolding around those calls - routing, tool dispatch, voice I/O, document retrieval - running it all at the edge on Workers AI eliminates a class of infrastructure complexity.

What We’re Watching

The TanStack AI integration is the most interesting piece strategically. TanStack has been building framework-agnostic primitives for years (Query, Router, Table, Form). An AI primitive that works across React, Vue, Solid, and Angular - backed by edge inference - is a different proposition than being locked into Next.js and the Vercel AI SDK.

The GLM-4.7-Flash model is worth evaluating for lightweight agent tasks where you don’t need frontier-model reasoning but do need fast, cheap tool calling at the edge. 131K context is generous for most agent loops.

We haven’t changed anything in our stack yet. This is a signal post, not a migration plan.

Ship LogJun 2, 2026

Clerk SSR auth on the internal console, reusing the pattern we already proved

Ship LogFeb 11, 2026

Decommissioning the legacy webhook worker

ArticleFeb 5, 2026

From Monolith to Microworker - Decommissioning the Relay

We deleted a 3,234-line Cloudflare Worker, its database, and its storage bucket. Here is what we learned about scope creep in serverless architectures.

What Dropped

Why It Matters

What We’re Watching

Related