Sanity

How to Connect Sanity CMS Content with AI Search and RAG

Learn how to connect Sanity CMS with AI-powered search and RAG pipelines. This tutorial covers embeddings, GROQ queries, vector indexing, and building a semantic search API with Next.js.

June 26, 202612 min readMuhammad Zohaib Ramzan

Abstract visualization of AI neural network and data pipeline with glowing nodes and connections on a dark background

If you've ever wanted to give your users a search experience that actually understands what they're looking for — not just matching keywords — you're in the right place. In this tutorial, we'll walk through how to connect Sanity CMS content with an AI-powered search pipeline using Retrieval-Augmented Generation (RAG). By the end, you'll have a working semantic search API backed by vector embeddings, all fed from your Sanity content lake.

Why Sanity is Ideal for AI-Powered Search

Sanity CMS is a natural fit for AI search pipelines for several reasons:

Structured content: Every document in Sanity has a well-defined schema. This means your data is clean, typed, and predictable — exactly what embedding models need.
GROQ query language: Sanity's Graph-Relational Object Queries (GROQ) let you project precisely the fields you want to embed, avoiding noise from irrelevant metadata.
CDN-backed Content Lake: Sanity's API is fast and globally distributed, making it practical to fetch large volumes of content for batch embedding jobs.
Schema flexibility: You can evolve your content model without breaking your embedding pipeline — just update your GROQ projection.
Real-time webhooks: Sanity supports webhooks on document mutations, enabling you to keep your vector index fresh as content changes.

Together, these properties make Sanity an excellent source of truth for a RAG system.

Setting Up Content Embeddings from Sanity

The first step is to fetch your Sanity documents and convert them into vector embeddings using an embedding model. Here's a TypeScript example using the Sanity client and the OpenAI Embeddings API:

import { createClient } from '@sanity/client'; import OpenAI from 'openai'; const sanity = createClient({ projectId: process.env.SANITY_PROJECT_ID!, dataset: process.env.SANITY_DATASET!, apiVersion: '2024-01-01', useCdn: false, }); const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! }); export async function embedDocuments() { const docs = await sanity.fetch<{ _id: string; title: string; body: string }[]>( `*[_type == "post"]{ _id, title, "body": pt::text(body) }` ); const embedded = await Promise.all( docs.map(async (doc) => { const text = `${doc.title}\n\n${doc.body}`; const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: text, }); return { id: doc._id, values: response.data[0].embedding, metadata: { title: doc.title }, }; }) ); return embedded; }

The key insight here is the pt::text(body) GROQ function, which flattens Portable Text into a plain string — perfect for embedding.

GROQ Queries for Structured Data Export

Choosing the right GROQ projection is critical. You want to include semantically rich fields while excluding noise like internal IDs, image assets, and system metadata.

Basic post projection

*[_type == "post" && defined(body)] { _id, title, excerpt, "body": pt::text(body), "category": category->title, "tags": tags[] }

Filtering only published content

*[_type == "post" && publishedAt <= now()] | order(publishedAt desc) { _id, title, excerpt, "body": pt::text(body), publishedAt }

Paginated fetch for large datasets

*[_type == "post"][0..99] { _id, title, "body": pt::text(body) }

Always use pt::text() to serialize Portable Text. Embedding raw Portable Text JSON will produce poor-quality vectors.

Indexing Sanity Content in a Vector Database

Once you have embeddings, you need to store them in a vector database. Here's a TypeScript example using Pinecone:

import { Pinecone } from '@pinecone-database/pinecone'; import { embedDocuments } from './embedDocuments'; const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! }); export async function indexSanityContent() { const index = pinecone.index(process.env.PINECONE_INDEX_NAME!); const vectors = await embedDocuments(); const batchSize = 100; for (let i = 0; i < vectors.length; i += batchSize) { const batch = vectors.slice(i, i + batchSize); await index.upsert(batch); console.log(`Upserted batch ${i / batchSize + 1}`); } console.log(`Indexed ${vectors.length} documents.`); }

If you prefer a self-hosted option, pgvector (PostgreSQL extension) is an excellent alternative. The upsert pattern is the same — only the client library changes.

Building a Semantic Search API

Now let's wire everything together into a Next.js API route. This endpoint accepts a user query, embeds it, queries the vector database, and returns matching Sanity documents:

// app/api/search/route.ts import { NextRequest, NextResponse } from 'next/server'; import OpenAI from 'openai'; import { Pinecone } from '@pinecone-database/pinecone'; import { createClient } from '@sanity/client'; const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY! }); const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY! }); const sanity = createClient({ projectId: process.env.SANITY_PROJECT_ID!, dataset: process.env.SANITY_DATASET!, apiVersion: '2024-01-01', useCdn: true, }); export async function POST(req: NextRequest) { const { query } = await req.json(); if (!query) return NextResponse.json({ error: 'Query is required' }, { status: 400 }); // 1. Embed the user query const embeddingResponse = await openai.embeddings.create({ model: 'text-embedding-3-small', input: query, }); const queryVector = embeddingResponse.data[0].embedding; // 2. Search the vector index const index = pinecone.index(process.env.PINECONE_INDEX_NAME!); const searchResults = await index.query({ vector: queryVector, topK: 5, includeMetadata: true, }); // 3. Hydrate full documents from Sanity const ids = searchResults.matches.map((m) => m.id); const docs = await sanity.fetch( `*[_id in $ids]{ _id, title, excerpt, slug }`, { ids } ); return NextResponse.json({ results: docs }); }

This pattern — embed → search → hydrate from CMS — is the core of a RAG retrieval step.

Rendering AI Search Results in Next.js

With the API in place, here's a React component that provides a search input and renders results:

// components/SemanticSearch.tsx 'use client'; import { useState } from 'react'; type SearchResult = { _id: string; title: string; excerpt: string; slug: { current: string }; }; export function SemanticSearch() { const [query, setQuery] = useState(''); const [results, setResults] = useState<SearchResult[]>([]); const [loading, setLoading] = useState(false); async function handleSearch(e: React.FormEvent) { e.preventDefault(); setLoading(true); try { const res = await fetch('/api/search', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ query }), }); const data = await res.json(); setResults(data.results); } finally { setLoading(false); } } return ( <div> <form onSubmit={handleSearch}> <input type="text" value={query} onChange={(e) => setQuery(e.target.value)} placeholder="Search articles..." /> <button type="submit" disabled={loading}> {loading ? 'Searching...' : 'Search'} </button> </form> <ul> {results.map((result) => ( <li key={result._id}> <a href={`/posts/${result.slug.current}`}> <strong>{result.title}</strong> </a> <p>{result.excerpt}</p> </li> ))} </ul> </div> ); }

This component is intentionally unstyled — drop in your own CSS or Tailwind classes to match your design system.

Common Mistakes

Avoid these pitfalls when building your Sanity AI search pipeline:

Not chunking large documents. Embedding an entire long-form article as a single vector degrades retrieval quality. Split documents into sections or paragraphs before embedding.
Ignoring metadata in vector records. Store useful metadata (title, category, slug, publishedAt) alongside your vectors so you can filter results without an extra Sanity fetch.
Stale embeddings. If you update content in Sanity but don't re-embed, your search results will be out of sync. Set up a Sanity webhook to trigger re-indexing on document publish.
Embedding raw Portable Text JSON. Always use pt::text() in your GROQ query to serialize Portable Text before embedding — raw JSON produces noisy, low-quality vectors.
Using the wrong embedding model. Make sure you use the same embedding model at index time and query time. Mixing models will produce incompatible vectors.
Over-fetching fields. Including image assets, internal references, and system metadata adds noise to your embeddings. Be deliberate with your GROQ projection.

Best Practices

Follow these guidelines for a robust, production-ready pipeline:

Chunk by semantic unit. Split content at heading or paragraph boundaries rather than by fixed character count to preserve meaning.
Store the Sanity document ID as the vector ID. This makes it trivial to hydrate full documents from Sanity after a vector search.
Use incremental re-indexing. Track a lastIndexedAt timestamp and only re-embed documents modified since the last run, rather than re-indexing everything.
Cache embedding results. Embeddings are deterministic for the same input — cache them to avoid redundant API calls and reduce costs.
Monitor embedding drift. When you upgrade your embedding model, re-index all documents in a single batch to avoid a mixed-model index.
Secure your API route. Add rate limiting and authentication to your /api/search endpoint to prevent abuse and control OpenAI API costs.

FAQ

What is RAG and why does it matter for Sanity CMS? RAG (Retrieval-Augmented Generation) is a technique where an AI model retrieves relevant documents before generating a response. Connecting Sanity to a RAG pipeline means your AI can answer questions grounded in your actual CMS content, reducing hallucinations.

Do I need to re-embed all my content every time I update a document? No — you only need to re-embed the specific document that changed. Use Sanity webhooks to listen for publish events and trigger a targeted re-index for that document's _id.

Which vector database should I choose? For managed, serverless simplicity, Pinecone is a popular choice. For teams already on PostgreSQL, pgvector is a great self-hosted option. Both work equally well with the patterns shown in this tutorial.

Can I use this approach with content types other than posts? Absolutely. The same pipeline works for any Sanity document type — products, documentation pages, FAQs, and more. Just update your GROQ query to target the relevant _type and fields.

How do I handle multilingual content in Sanity? Embed each language variant separately and store the locale as metadata on the vector record. At query time, filter by locale to ensure results match the user's language.

Conclusion

Connecting Sanity CMS to an AI-powered search and RAG pipeline is more straightforward than it might seem. The combination of GROQ's precise projections, Sanity's structured content model, and modern embedding APIs gives you all the building blocks you need. Start by exporting your content with pt::text(), embed it with OpenAI, index it in Pinecone or pgvector, and expose it through a Next.js API route. From there, the door is open to full RAG experiences — chatbots, semantic search, and AI-assisted content discovery. Experiment, iterate, and let your content work harder for your users.