Sanity

How to Build AI Agents with Next.js, TypeScript, and APIs

Learn to build production-ready AI agents with Next.js 14, TypeScript, and the OpenAI/Anthropic APIs. Covers agent loops, tool calling, memory, streaming, and Vercel deployment.

June 26, 202612 min readMuhammad Zohaib Ramzan
AI agent architecture diagram with Next.js and TypeScript code

Building AI agents is one of the most exciting frontiers in modern web development. In this tutorial, you'll go from zero to a fully deployed, production-ready AI agent running inside a Next.js 14 application — complete with tool calling, streaming responses, conversation memory, and Vercel deployment.

What is an AI Agent

An AI agent is a program that uses a large language model (LLM) as its reasoning engine to autonomously decide what actions to take in order to complete a goal. Unlike a simple chatbot that responds to a single prompt, an agent operates in a reasoning loop — it thinks, acts, observes the result, and thinks again.

The core architecture of an agent consists of three components:

  • The LLM (the brain): Receives a system prompt, conversation history, and available tools. Decides what to do next.
  • Tools (the hands): Functions the agent can call — search the web, query a database, call an external API, run code.
  • The loop (the engine): Repeatedly calls the LLM, executes any requested tool calls, feeds results back, and continues until the agent signals it is done.

This pattern is sometimes called ReAct (Reason + Act). The agent reasons about the current state, acts by calling a tool, and then reasons again with the new information. A minimal agent loop in pseudocode:

while not done: response = llm.call(messages, tools) if response.has_tool_call: result = execute_tool(response.tool_call) messages.append(tool_result) else: return response.text

Understanding this loop is the foundation for everything that follows.

Project Setup

You'll need Node.js 18+ and a Next.js 14 project using the App Router. Bootstrap a new project with:

npx create-next-app@latest ai-agent-demo --typescript --app --tailwind

Install the AI SDKs

Install the OpenAI SDK and, optionally, the Anthropic SDK alongside Vercel's AI SDK for streaming utilities:

npm install openai @anthropic-ai/sdk ai

TypeScript Configuration

Ensure your tsconfig.json targets ES2017 or later and has strict mode enabled: {"target": "ES2017", "strict": true, "moduleResolution": "bundler"}.

Environment Variables

Create a .env.local file at the project root and add your keys:

OPENAI_API_KEY=sk-... ANTHROPIC_API_KEY=sk-ant-...

Never commit this file — add it to .gitignore. Access keys server-side via process.env.OPENAI_API_KEY. Next.js App Router Route Handlers run on the server by default, so your keys are never exposed to the browser.

Building the Agent Core

Create a dedicated module for your agent logic at lib/agent.ts.

TypeScript Interfaces

Start by defining the types that model your agent's state:

import type { ChatCompletionMessageParam } from 'openai/resources/chat'; export interface AgentTool { name: string; description: string; parameters: Record<string, unknown>; execute: (args: Record<string, unknown>) => Promise<string>; } export interface AgentOptions { systemPrompt: string; tools: AgentTool[]; maxIterations?: number; } export interface AgentState { messages: ChatCompletionMessageParam[]; iterations: number; }

The System Prompt

The system prompt is the agent's personality and instruction set. Be explicit about what the agent can and cannot do, and when it should stop calling tools and return a final answer.

The Agent Loop

Here is a complete, typed agent loop using the OpenAI SDK. The function initializes the message history, then iterates: it calls the LLM, checks for tool calls, executes them in parallel with Promise.all, appends the results, and loops. When the model returns a message with no tool calls, that is the final answer.

export async function runAgent(userMessage: string, options: AgentOptions): Promise<string> — initializes state.messages with the system prompt and user message, then enters a while (state.iterations < maxIterations) loop. Each iteration calls openai.chat.completions.create({ model: 'gpt-4o', messages, tools, tool_choice: 'auto' }). If message.tool_calls is empty, the function returns message.content. Otherwise it executes each tool via Promise.all(message.tool_calls.map(...)) and pushes { role: 'tool', tool_call_id, content: result } messages back into the history.

A helper converts your AgentTool definition into the shape OpenAI expects: { type: 'function', function: { name, description, parameters } }.

Connecting External APIs

Defining Tools

Tools are async functions wrapped in a descriptor. A weather tool that calls an external API looks like this:

const weatherTool: AgentTool = { name: 'get_weather', description: 'Get the current weather for a city.', parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] }, execute: async ({ city }) => { const res = await fetch(`https://api.weatherapi.com/v1/current.json?key=${process.env.WEATHER_API_KEY}&q=${city}`); const data = await res.json(); return `${city}: ${data.current.condition.text}, ${data.current.temp_c}°C`; } }

Streaming Responses with ReadableStream

For a better user experience, stream the agent's final response token-by-token. Create a Next.js Route Handler at app/api/agent/route.ts with export const runtime = 'edge':

const stream = await openai.chat.completions.create({ model: 'gpt-4o', stream: true, messages }); Wrap it in a new ReadableStream({ async start(controller) { for await (const chunk of stream) { controller.enqueue(new TextEncoder().encode(chunk.choices[0]?.delta?.content ?? '')); } controller.close(); } }) and return it as a new Response(readableStream, { headers: { 'Content-Type': 'text/plain; charset=utf-8' } }).

On the client, read the stream with the Fetch API: call res.body!.getReader(), loop with reader.read(), decode each chunk with new TextDecoder(), and update your React state incrementally.

Anthropic Tool Use

If you prefer Anthropic's Claude, the tool use API is structurally similar but uses different field names. Replace tool_calls with tool_use content blocks and use tool_result message roles. The agent loop logic remains identical — only the SDK calls change.

Adding Memory

Memory is what separates a stateless chatbot from a true agent. There are two kinds:

Short-Term Memory (Conversation History)

Short-term memory is simply the messages array you pass to the LLM on every call. The challenge is context window limits. GPT-4o supports 128k tokens, but sending the entire history on every request is expensive. A practical strategy is a sliding window: keep the system prompt, the last N messages, and a summary of older messages.

A trimMessages helper filters out the system messages, slices the rest to the last maxMessages entries (default 20), and returns [...system, ...trimmed].

Long-Term Memory (Vector Store / Redis)

For persistent memory across sessions, you need an external store. Two common patterns:

  • Redis for session state: Store the serialized messages array in Redis keyed by session ID. Use @upstash/redis for a serverless-compatible client. Call redis.get<ChatCompletionMessageParam[]>(sessionId) to load history and redis.set(sessionId, messages, { ex: 86400 }) to persist it with a 24-hour TTL.
  • Vector store for semantic recall: Embed past interactions and store them in Pinecone or Supabase pgvector. On each new turn, retrieve the top-K most semantically relevant past memories and inject them into the system prompt. This lets the agent remember facts from thousands of past turns without blowing the context window.

Deploying on Vercel

Vercel is the natural deployment target for Next.js agents. Here's what to know:

Edge Functions vs. Node.js Runtime

For streaming responses, use the Edge Runtime by adding export const runtime = 'edge' to your Route Handler. Edge functions start faster (no cold start penalty) and support streaming natively. For heavy computation or Node.js-only packages, use the default Node.js runtime and accept the cold start trade-off.

Environment Variables

Set all secrets in the Vercel dashboard under Settings → Environment Variables. Never hardcode keys. Vercel automatically injects them at build and runtime.

Streaming Support

Vercel's infrastructure fully supports ReadableStream responses on both Edge and Node.js runtimes. Ensure your Response sets the correct Content-Type header (text/plain or text/event-stream for SSE).

Function Timeout

Vercel Hobby plan limits function execution to 10 seconds. Pro plan allows up to 300 seconds for Node.js functions and 30 seconds for Edge functions. Long-running agent loops may need to be broken into smaller steps or use a queue-based architecture (e.g., Inngest or Trigger.dev) on the Hobby plan.

Deployment

Vercel auto-deploys on every push to main. Preview deployments are created for every pull request, making it easy to test agent changes before merging.

Common Mistakes

Avoid these pitfalls that trip up most developers building agents for the first time:

  • Token limit overruns: Passing the full conversation history without trimming will eventually exceed the model's context window and throw an error. Always implement a trimming or summarization strategy.
  • Infinite loops: If a tool always returns an error or the model keeps calling the same tool repeatedly, your agent will loop forever. Always set a maxIterations cap and throw a descriptive error when it's hit.
  • Missing error handling around tool calls: JSON.parse(toolCall.function.arguments) will throw if the model returns malformed JSON. Wrap tool execution in try/catch and return a structured error string back to the model so it can recover gracefully.
  • Exposing API keys client-side: Never instantiate the OpenAI client in a 'use client' component. All LLM calls must go through a server-side Route Handler or Server Action.
  • Not validating tool arguments: The model can hallucinate argument values. Use zod to parse and validate tool arguments before executing them: const { city } = WeatherArgs.parse(rawArgs).

Best Practices

Follow these patterns to build agents that are reliable, observable, and cost-efficient:

  • Use structured outputs: When you need the agent to return data (not prose), use OpenAI's response_format: { type: 'json_object' } or Anthropic's structured output mode. This eliminates parsing errors and makes downstream processing deterministic.
  • Implement retry logic with exponential backoff: LLM APIs are rate-limited and occasionally return 5xx errors. Wrap API calls in a withRetry helper that retries up to 3 times with 2 ** attempt * 500ms delays.
  • Add observability: Log every LLM call with its input token count, output token count, model name, latency, and tool calls made. Tools like LangSmith, Helicone, or structured logging to Vercel Log Drains give you visibility into agent behavior in production.
  • Manage costs proactively: Use gpt-4o-mini for simple reasoning steps and reserve gpt-4o for complex multi-step decisions. Cache deterministic tool results to avoid redundant API calls. Set hard spending limits in your OpenAI dashboard.
  • Write deterministic unit tests for tools: Your tool execute functions are pure async functions — test them independently with vitest or jest without involving the LLM at all.
  • Version your system prompts: Treat system prompts like code. Store them in version control, track changes, and A/B test prompt variations using feature flags.

FAQ

Can I use the Vercel AI SDK instead of the raw OpenAI SDK?

Yes. The Vercel AI SDK (ai package) provides higher-level abstractions like streamText, generateText, and useChat that handle streaming, message formatting, and React state management for you. It supports OpenAI, Anthropic, Google, and many other providers through a unified interface. For simple agents, it significantly reduces boilerplate.

How do I handle tool calls that take a long time?

For long-running tools (e.g., running a web scraper), consider offloading the work to a background job queue such as Inngest or Trigger.dev. The agent can poll for the result on subsequent turns, or you can use a webhook to resume the agent loop when the job completes.

What's the difference between an agent and a chain?

A chain is a fixed, predetermined sequence of LLM calls — the developer decides the steps in advance. An agent is dynamic — the LLM itself decides which tools to call and in what order based on the current context. Agents are more flexible but harder to debug and more expensive to run.

How do I prevent the agent from calling tools it shouldn't?

Be explicit in your system prompt about when tools should and should not be used. You can also use tool_choice: 'none' to prevent any tool calls on a given turn, or force a specific tool with tool_choice: { type: 'function', function: { name: 'specific_tool' } }. Additionally, only expose tools that are relevant to the current task.

Is it safe to let the agent execute arbitrary code?

No. Never give an agent a tool that executes arbitrary user-supplied code without sandboxing. Use a secure execution environment like E2B or a Docker container with strict resource limits. Treat agent-generated code with the same skepticism as user-generated content.

Conclusion

You now have a complete blueprint for building production-ready AI agents with Next.js 14, TypeScript, and the OpenAI or Anthropic APIs. Here's a summary of what you've covered:

  • The agent loop — the core reasoning cycle of think, act, observe
  • Project setup with Next.js App Router, TypeScript, and the OpenAI SDK
  • A fully typed agent core with tool definitions and message history management
  • Streaming responses using ReadableStream and Edge Runtime Route Handlers
  • Short-term and long-term memory patterns with Redis and vector stores
  • Vercel deployment with edge functions, environment variables, and timeout considerations
  • Common mistakes and how to avoid them
  • Best practices for reliability, observability, and cost management

Next Steps

Ready to go deeper? Consider exploring these topics:

  • Multi-agent systems: Orchestrate multiple specialized agents that delegate tasks to each other using a supervisor pattern.
  • Retrieval-Augmented Generation (RAG): Give your agent access to a private knowledge base using embeddings and a vector store.
  • Human-in-the-loop: Add approval steps where the agent pauses and asks a human to confirm before executing high-stakes tool calls.
  • Evals: Build an automated evaluation harness to measure agent accuracy, tool call correctness, and response quality across a test suite.

The agent pattern is a powerful primitive. Once you understand the loop, the tools, and the memory model, you can build remarkably capable systems on top of Next.js with relatively little code.