Cloudflare Agents - The Platform For Building Agents

Launched on Apr 14, 2025

Cloudflare Agents is a developer platform for building AI agents on Cloudflare's global network. Powered by Durable Objects for stateful execution, Workers AI for serverless inference, and flexible pay-as-you-go pricing, developers can build intelligent agents with persistent state and real-time capabilities.

AI Agents FreemiumAI Agent FrameworkWorkflow AutomationOpen Source

Visit Website

What is Cloudflare Agents Core Features of Cloudflare Agents Technical Architecture and Core Capabilities Getting Started Pricing Structure Ecosystem and Integrations Frequently Asked Questions Comments Related Content

What is Cloudflare Agents

Building AI agents presents developers with fundamental architectural challenges that traditional serverless platforms struggle to address. Stateless execution models force developers to implement complex state management layers, while WebSocket connections for real-time interactions incur continuous billing even during idle periods. These constraints make it difficult to build responsive, cost-effective agents that maintain conversation context across extended interactions.

Cloudflare Agents emerges as a purpose-built platform for constructing intelligent agents on a global network spanning 330 cities across more than 125 countries. The platform processes an average of 93 million HTTP requests daily and proxies approximately 20% of global web traffic, providing the infrastructure backbone that sophisticated AI agents require. This isn't merely a hosting environment—it's an integrated development platform combining durable execution, serverless inference, and consumption-based pricing into a cohesive architecture.

The platform's core value proposition centers on three integrated capabilities. First, Durable Objects provide truly stateful execution where each agent operates as a persistent microserver with automatic state persistence across deployments and hibernation cycles. Second, Workers AI delivers serverless GPU inference supporting leading models including Llama, Claude, and Gemini without managing underlying infrastructure. Third, the pricing model aligns costs directly with actual usage—developers pay only for CPU time consumed, not wall-clock time, and WebSocket connections can hibernate to stop billing while maintaining connectivity.

A concrete example illustrates the platform's capability: Knock leveraged the Cloudflare Agents SDK to construct a remote MCP server, demonstrating how enterprises can rapidly deploy production-ready agents that integrate with their existing tool ecosystems. The platform handles the complexity of stateful execution, real-time communication, and AI inference so developers can focus on agent logic and user experience.

Key Takeaways

TypeScript class-based Agent SDK with @callable() decorators for RPC method exposure
Durable Objects provide persistent stateful execution without external databases
Workers AI enables serverless GPU inference for Llama, Claude, Gemini models
Tool system with MCP integration for extending agent capabilities
Pay-per-use pricing: CPU time billing, WebSocket hibernation, no egress fees

Core Features of Cloudflare Agents

The platform provides a comprehensive toolkit for building sophisticated AI agents, with each feature designed to address specific development challenges while maintaining architectural simplicity.

The Agent SDK implements a TypeScript class-based framework where agents inherit from an Agent base class and expose methods via the @callable() decorator. This approach enables RPC-style method invocation where client applications can trigger server-side logic seamlessly. WebSocket support allows for long-lived connections with hibernation capability—when an agent remains inactive, it pauses billing while preserving the connection for subsequent interactions.

Built-in state management eliminates the need for external databases in most scenarios. Each agent receives an embedded SQLite database and key-value state storage that automatically persists across deployments and hibernation cycles. This durability means conversation history, user preferences, and business state survive infrastructure changes without manual synchronization.

For developers seeking rapid deployment, AIChatAgent provides an out-of-the-box intelligent chat implementation. It integrates with the ai SDK from Vercel, supports streamText for streaming responses, and offers a React hook (useAgentChat) for seamless UI integration. Message persistence and automatic reconnection after disconnections come built-in, reducing the implementation burden for conversational interfaces.

Model flexibility represents another strength. Workers AI hosts an array of built-in models including Llama 3.1/3.2/3.3, Mistral, DeepSeek R1, Gemma, and Qwen. Beyond these, developers can connect to third-party models through AI Gateway—OpenAI GPT-4, Anthropic Claude, and Google Gemini integrate through standardized APIs, enabling model selection based on capability requirements or cost optimization.

The tool system extends agent functionality through callable methods that can interact with external APIs, databases, or business logic. MCP (Model Context Protocol) support allows agents to consume tools from external MCP servers or expose their own capabilities as MCP services. This bidirectional integration creates a rich ecosystem where agents can leverage Slack, GitHub, database connectors, and custom services.

Scheduled task execution works through the built-in Scheduler API supporting cron expressions and delayed execution. Agents can register for periodic tasks like daily reports, weekly summaries, or time-triggered workflows. Browser Rendering API enables headless browser automation for web scraping, screenshot capture, and interaction with dynamic web content.

Stateful execution: Persistent state without external databases, automatic cross-deployment durability
Cost efficiency: CPU-time billing (not wall-clock), WebSocket hibernation stops all charges during idle periods
Model flexibility: Workers AI built-ins plus OpenAI, Claude, Gemini via AI Gateway
Rich integrations: MCP ecosystem, WebSocket support, scheduled tasks, browser automation
Developer experience: TypeScript-first SDK, built-in SQLite, React hooks for chat UI

Platform lock-in: Built specifically for Cloudflare Workers ecosystem
Learning curve: Durable Objects concepts require architectural understanding
Regional availability: Some AI models may have geographic deployment limitations

Technical Architecture and Core Capabilities

The architectural foundation distinguishes Cloudflare Agents from conventional serverless platforms by providing genuinely stateful execution at edge locations worldwide.

Durable Objects serve as the execution runtime for each agent—a paradigm where every agent runs within its own Durable Object, a stateful microserver designed for coordination and data storage. Unlike traditional serverless functions that execute and terminate, Durable Objects persist indefinitely, maintaining in-memory state across invocations. This persistence occurs automatically: when an agent deploys, its state survives; when it hibernates due to inactivity, state remains intact upon reactivation. This eliminates the complexity of external state stores, Redis caches, or database lookups for session continuity.

Workers AI delivers serverless GPU inference at the edge. The service supports major models including Llama 3.1/70B, Mistral 7B, DeepSeek R1, Gemma, and Qwen, with inference executed on distributed GPU infrastructure. The billing model calculates charges based on actual CPU time consumed during inference—not wall-clock elapsed time—resulting in cost efficiency for operations with variable computation requirements. Free tier provides 10,000 Neurons daily, enabling development and testing without initial charges.

Vectorize provides vector database capabilities for semantic search and retrieval-augmented generation (RAG) workloads. Agents can store embeddings and perform similarity searches without deploying separate vector database infrastructure. D1 extends SQL capabilities with serverless SQLite, offering relational storage that integrates natively with the agent execution model.

Workflows orchestrate multi-step business processes with guaranteed execution semantics. When agents require complex sequences—data fetching, processing, external API calls, human approval—the Workflows engine ensures reliability through automatic retry logic, persistent state tracking, and execution guarantees that survive infrastructure failures. Task duration can extend to days or weeks when necessary, with CPU time limits of 5 minutes per request (configurable) and scheduled tasks up to 15 minutes.

The platform's scalability demonstrates enterprise readiness: architecture supports scaling to tens of millions of simultaneous Durable Object instances. Performance optimization includes GPU utilization efficiency—billing applies only to actual compute time—and WebSocket hibernation that suspends billing during connection idle periods while maintaining active connections.

🏗️ Architecture Selection Guide

For conversational agents requiring conversation history persistence, combine Durable Objects (stateful execution) with AIChatAgent (built-in chat UI support). For knowledge-intensive applications needing context retrieval, pair Vectorize with Workers AI embedding models. For long-running business workflows, leverage Workflows with MCP tool integrations. Select larger models like Llama 3.1 70B for complex reasoning; smaller models like Llama 3.2 1B for cost-sensitive, high-volume operations.

Getting Started

Developers can begin building agents within minutes using the provided starter templates and command-line tools.

Environment Requirements: Node.js 18 or higher, npm package manager, and a Cloudflare account. The Wrangler CLI (Cloudflare's deployment tool) installs automatically with the starter template.

Installation Steps:

npm i agents
npx create-cloudflare@latest --template cloudflare/agents-starter
cd agents-starter && npm install
npm run dev

The first command installs the Agents SDK. The second generates a complete starter project with pre-configured TypeScript setup, dependencies, and example implementations. The third installs all required packages, and the fourth launches local development server with hot reloading.

The starter template demonstrates a complete Lunch Agent implementation showcasing key concepts:

export class LunchAgent extends Agent<Env, LunchState> {
  @callable()
  async nominateRestaurant(restaurantName: string) {
    // Restaurant nomination logic
  }

  // Schedule tasks using cron expressions
  this.schedule('weekdays at 11:30pm', 'chooseLunch');
  this.schedule('daily at 5pm', 'resetLunch');
}

This minimal example demonstrates agent class definition extending the Agent base class, callable methods decorated with @callable() for RPC exposure, internal state management via the generic type parameter, and scheduled task registration using cron syntax.

Deployment: Once local development validates functionality, production deployment requires a single command:

npx wrangler deploy

This packages the agent, uploads to Cloudflare's global network, and makes it immediately available across all edge locations. The free tier accommodates 100,000 requests daily with 10ms CPU time per request—sufficient for development, testing, and small production workloads.

💡 Development Best Practices

Use npm run dev for iterative development with live reloading. Leverage Cloudflare's local storage emulation (Miniflare) to test Durable Object persistence and state management without cloud deployment. When ready for production, verify all environment variables and secrets through wrangler secret put before deploying.

Pricing Structure

Cloudflare Agents pricing follows a consumption model aligned with actual resource utilization, ensuring costs correspond directly to application activity.

Workers Plans Comparison:

Plan	Requests	CPU Time	Price
Free	100,000/day	10ms/request	$0/month
Paid	10 million/month	30 million CPU ms/month	$5/month

Overage charges apply at $0.30 per million requests beyond the included allocation, and $0.02 per million CPU milliseconds for compute time exceeding plan limits.

Workers AI Pricing (Per Million Tokens):

Model	Input	Output
Llama 3.2 1B	$0.027	$0.201
Llama 3.2 3B	$0.051	$0.335
Llama 3.1 8B	$0.282	$0.827
Llama 3.1 70B	$0.293	$2.253
DeepSeek R1	$0.497	$4.881

Workers AI uses Neurons as the billing unit, with free allocation of 10,000 Neurons daily. Overage charges apply at $0.011 per thousand Neurons.

Durable Objects Pricing:

Requests: 1 million/month included (Free: 100,000/day)
Duration: 400,000 GB-s/month included (Free: 13,000 GB-s/day)

Cost Optimization Strategies: The platform charges only for active CPU execution—not I/O wait time during network calls or database queries. WebSocket hibernation suspends duration-based billing during idle periods while maintaining connection state. Storage services like R2 and D1 impose no egress fees, eliminating data transfer costs that typically complicate cloud budgeting.

💡 Starting Strategy

Begin with the Free plan to validate agent architecture and estimate actual usage patterns. Most development and testing workloads remain within free tier limits. Upgrade to Paid ($5/month) when approaching 100,000 daily requests or requiring higher CPU time limits. Monitor the Cloudflare dashboard for real-time usage metrics and set up alerts for budget thresholds.

Ecosystem and Integrations

The platform integrates within Cloudflare's broader ecosystem while supporting external services through standardized protocols, creating a connected environment for agent development.

MCP Ecosystem: Model Context Protocol support enables seamless tool sharing between agents. The platform can consume tools from external MCP servers—Slack for messaging, GitHub for version control operations, database connectors for data access—and expose its own capabilities as MCP services for consumption by other agents. This bidirectional integration creates composable agent systems where specialized agents handle specific functions (data retrieval, notification dispatch, approval workflows) and coordinate through standardized interfaces.

Third-Party Model Integration: AI Gateway provides unified integration with external AI providers. OpenAI GPT-4, Anthropic Claude, and Google Gemini connect through consistent APIs, enabling model selection based on task requirements, cost considerations, or capability needs without code restructuring.

Developer Tools: The ecosystem includes Workers Playground for browser-based experimentation, Wrangler CLI for command-line workflows, and Chrome DevTools integration for debugging. The GitHub repository hosts complete reference implementations including Lunch Agent, Chat Agent, and Slack Agent demonstrating real-world patterns.

Community Resources: The Discord developer community connects thousands of developers building on Cloudflare. Documentation, tutorials, and community forums provide ongoing support. The status page maintains transparency about platform availability.

Enterprise Readiness: SOC 2 certification and GDPR compliance support enterprise adoption requirements. Integration capabilities accommodate existing identity systems, security infrastructure, and compliance frameworks.

🚀 Integration Priority

For rapid capability expansion, start with MCP integrations—Slack, GitHub, and database connectors provide immediate utility. For knowledge-intensive agents, combine Vectorize with Workers AI embedding models for RAG functionality. Advanced users can leverage Workflows for complex orchestration across multiple agents and external services.

Frequently Asked Questions

How does Cloudflare Agents differ from other agent frameworks?

Cloudflare Agents builds on Durable Objects to provide genuinely stateful execution. Each agent operates as a persistent microserver where state automatically persists across deployments and hibernation cycles. This eliminates the need for external state management—the database, cache, and session store requirements that complicate other frameworks.

How do I start building my first agent?

Run the following three commands to generate a complete starter project:

npx create-cloudflare@latest --template cloudflare/agents-starter
cd agents-starter && npm install
npm run dev

The template includes AI chat functionality, tool calling, and task scheduling pre-configured. The documentation at developers.cloudflare.com/agents provides comprehensive guidance for customization.

Which AI models are supported?

Workers AI hosts built-in models including Llama 3.1/3.2/3.3, Mistral, DeepSeek R1, Gemma, and Qwen. External models connect through AI Gateway—OpenAI GPT-4, Anthropic Claude, and Google Gemini integrate via standardized APIs.

How is pricing calculated?

Workers Paid plan starts at $5/month with 10 million requests and 30 million CPU milliseconds included. Workers AI charges per Neuron consumed during inference at $0.011 per thousand Neurons beyond the free 10,000 daily allocation.

Can agents run for extended durations?

Yes. Durable Objects support long-running operations with CPU time limits of 5 minutes per request (configurable) and scheduled tasks up to 15 minutes. Workflows enables task orchestration spanning days or weeks with automatic retry and persistent state.

How is agent reliability ensured?

Workflows guarantees execution through automatic retry logic, persistent state management, and execution guarantees that survive infrastructure failures. Integration with Cloudflare Logs and Traces provides observability for monitoring, alerting, and debugging production agents.

Is MCP supported?

Fully supported. Agents can function as MCP servers exposing tools to other agents, or connect as MCP clients to consume tools from external MCP servers including Slack, GitHub, databases, and custom services.