Skip to main content
AI & Search

Retrieval-Augmented Generation

RAG

Portrait of Lukas Horvath, co-founder of Roelu Studio
Lukas HorvathCo-founder

What is Retrieval-Augmented Generation?

Retrieval augmented generation, or RAG, is a method that combines a large language model with a separate search step. Before generating an answer, the system retrieves relevant documents from a knowledge base, then passes them to the model as context. The model writes the response using that retrieved content. RAG is how AI tools answer questions about your private docs, your product, or anything more recent than the training cutoff.

Why it matters

Base LLMs are frozen in time. Ask one about your new pricing page and it makes something up. RAG fixes that without retraining the model. You point it at your help center, your case studies, your support tickets — and now the AI answers from your actual content, with citations. Every serious AI product touching customer data uses RAG underneath. For marketing teams, it matters because RAG is also how Perplexity and ChatGPT Search read your site in real time. If your content is not easy to retrieve and chunk, you do not get cited.

How it works

First, your documents are split into chunks — usually a few hundred words each — and converted into vectors using an embedding model. Those vectors live in a vector database. When a user asks a question, their query is also embedded, and the database returns the chunks closest in meaning. Those chunks get stuffed into the prompt alongside the user's question. The LLM then writes an answer grounded in that retrieved context, and usually cites which chunks it used. Done well, the user gets a specific, current, sourced response. Done badly, the retrieval misses and the model fills in with guesses.

  • A type of AI trained on huge volumes of text that can read, write, and answer questions in plain language — the engine behind ChatGPT, Claude, Gemini, and most…

  • Semantic Search

    AI & Search

    A search method that matches on meaning instead of exact words, so a query for 'fix slow website' returns pages about performance optimization even if those…

  • Context Window

    AI & Search

    The maximum amount of text an AI model can read and remember in a single prompt — measured in tokens, and the hard limit that decides whether the model can…

  • AI Agent

    AI & Search

    An AI system that can take actions on its own — booking meetings, sending emails, querying databases, running code, updating records — instead of just…

  • Hallucination

    AI & Search

    When an AI model confidently states something that is not true — a fake citation, a made-up statistic, a non-existent product feature — with no signal to the…

  • The practice of writing instructions to an AI model in a way that gets a reliable, useful result — part technical writing, part specification, part figuring…

  • Marking up your website content with schema, clean HTML, and machine-readable structure so AI models can extract and cite it accurately — the technical…