RAG Pipelines: Examples, Process, and How to Build (For Business & Data Newcomers)

What Is RAG? A Practical Guide to Retrieval-Augmented Generation

If you’re new to AI, retrieval-augmented generation (RAG) is one of the most practical ways to put AI to work on your business data. Think of it as giving an AI assistant a fast, reliable way to look things up in your company’s trusted documents before it answers, so it cites the right policy, the current price list, or the latest product manual.

Why now? Independent surveys show organizations have moved from experimentation to everyday use. In McKinsey’s 2024 global survey, 65% of organizations report they regularly use generative AI—nearly double the figure from just ten months prior.

This guide is written for people who are new to data and AI—new data analysts, business owners, and leaders who want practical, low-jargon guidance. You’ll get clear definitions, copy-and-paste patterns, and a step-by-step build recipe you can adapt to your stack.

What is RAG?

RAG is a pattern where a large language model (LLM) retrieves relevant facts from your own knowledge sources (policies, manuals, tickets, wiki pages, product data) and augments the prompt with those facts before generating an answer. That’s why RAG is great for “what’s true at our company?” questions.

RAG vs. training/fine-tuning: Fine-tuning changes what a model knows (by updating weights). RAG changes what a model uses at answer time (by looking up fresh, governed data). Because RAG keeps your data separate from the model, it’s usually faster to launch, easier to govern, and cheaper to iterate.

Along the way, you’ll be connecting sources with data integration and sometimes simple ETL pipeline steps to clean and prepare content.

Why RAG matters now

Adoption is real. As noted above, 65% of organizations report regular genAI use—a strong signal that practical patterns like RAG are moving from “pilot” to “how we work.”
RAG reduces hallucinations and staleness. By grounding answers in your current content and requiring citations, you get more trustworthy responses than a model answering from general internet knowledge alone.
It fits business use cases. Customer support, internal knowledge search, field service, sales enablement, and compliance Q&A all benefit from finding the right paragraph in your content and using it to answer a question right now.
It scales with your architecture. Cloud reference designs often split RAG into ingestion, serving, and quality evaluation subsystems—useful for planning team responsibilities and tooling.

Core components of a RAG pipeline

Below is a stack-agnostic view you can map to your tools.

Ingestion (connect & collect)

Bring documents from shared drives, websites, ticketing systems, and apps. Normalize formats (PDF, DOCX, HTML, Markdown). This is where solid cloud data integration practices pay off.

Transformation (clean & normalize)
- Remove boilerplate (nav bars, footers), fix encodings, deduplicate.
- PII scrubbing where needed.
- Chunking: split long docs into retrievable units. Start with heading-based chunks or 100–400 token spans with small overlaps.
Embedding

Convert chunks to vectors using an embedding model. Track model version and settings so you can re-embed if you change models later.

Vector store

Store vectors + metadata (title, source, date, permissions, product line). Metadata lets you filter at query time (e.g., “only policy docs for Region EU”).

Retrieval

Use hybrid search (keyword + vector) for better recall, then re-rank a short list to pick the best few chunks (top-k). This boosts relevance for synonyms and acronyms.

Generation

Build a prompt template that instructs the model to answer only from retrieved context, requires citations, and specifies formatting (bullets, JSON, tone).

Refresh loop

Schedule re-indexing (e.g., nightly), re-embedding on model upgrades, and automatic invalidation when documents change. Pair this with data security management so access controls flow through.

Quality evaluation

Treat quality as a subsystem: maintain a test set, measure groundedness, and monitor over time (mirroring the “quality evaluation” subsystem in Google’s RAG reference). See Google’s overview above for a helpful mental model.

You’ll wire these pieces together through API integration with your content sources and operational systems.

How a RAG pipeline works end-to-end

RAG pipelines have two phases:

Indexing (offline)

Load → clean → chunk → embed → store (with metadata and permissions).

Retrieval + generation (online)

Take the user question → make a query vector → retrieve top-k relevant chunks (optionally with hybrid + re-ranking) → stuff those chunks into your prompt → generate an answer with citations.

This separation matters: you can iterate on chunking and metadata without touching your application, and you can tune retrieval/top-k/prompt without reprocessing the whole corpus.

Where costs accrue:

Embedding (once per document, plus updates),
Vector queries (at question time),
LLM tokens (context + output).

Cost levers: caching frequent queries, reducing top-k, using hybrid search to narrow candidate sets, compressing or summarizing chunks, and setting max answer lengths.

Examples

Below are four practical RAG patterns with inputs, outputs, and KPIs.

Customer support copilot

Sources: Product manuals, release notes, knowledge base, support macros, and known issues.
Guardrails: Restrict by product and version; enforce “don’t answer unsupported SKUs.”
KPI: First-contact resolution, handle time, deflection rate.
Implementation notes: Favor hybrid search to capture exact error codes and semantic matches; log every cited source. Use data integration tools to keep content fresh across systems.

Internal knowledge search

Sources: HR and IT policies, SOPs, project docs, and architecture decisions.
Guardrails: Row-level permissions—don’t retrieve what a user can’t see.
KPI: Time-to-answer; percentage of questions answered with citations; response satisfaction.
Implementation notes: Tag each chunk with owner/department, review date, and confidentiality level; connect to corporate identity provider (IdP). Reinforce enterprise hygiene with AI data governance.

eCommerce product advisor

Sources: Catalog (title, attributes), user reviews, inventory, pricing rules.
Guardrails: Enforce availability and pricing authority; prefer in-stock items.
KPI: Conversion rate, upsell/attach, return rate.
Implementation notes: Use metadata filters at query time (size, color, region). Summarize multiple reviews to avoid overlong prompts. When persisting catalog changes, follow data warehouse best practices to make indexing predictable.

Field service assistant

Sources: Service bulletins, repair histories, IoT telemetry rollups.
Guardrails: Offline fallback; safety notices always pinned first.
KPI: Mean time to repair; truck rolls avoided.
Implementation notes: Pre-compute embeddings for the most common failure codes; ship periodic index snapshots to edge devices.

Step-by-step: Your first RAG build

Use this starter recipe to go from zero to a working version. It’s stack-agnostic and mirrors cloud reference architectures that split work across ingestion, serving, and quality evaluation subsystems.

Pick a laser-focused use case + KPI. Examples: “Answer the top 50 HR policy questions with citations; target 85% helpfulness.” Tie success to something measurable, not “be smart.”
Collect and clean 50-500 documents. Export your highest-value pages first (FAQs, how-to guides, manuals). Strip navigation chrome and duplicated footers. If you’re moving data across systems, decide whether an ETL vs Data Pipeline approach fits your governance and latency needs.
Chunk with intent. Start with heading-based chunks (e.g., each H2/H3) and 100-400 token spans with ~10-20 token overlap for long paragraphs. Add metadata: owner, product, SKU/version, document date, region, and access level.
Choose embeddings and document schemas.
- Keep the same embedding model for your documents and queries to avoid mismatches.
- Record the model name and version in your metadata so you can re-embed if you change later.
Stand up a vector index.
- Use indexes that support metadata filters (e.g., region: EU, product: Alpha).
- Start with top_k=10; you’ll tune this later.
- Add a keyword index (BM25) alongside to support hybrid search.
Wire retrieval.
- Hybrid search: run keyword and vector searches; union candidates.
- Re-rank: score with a cross-encoder or LLM scoring prompt; keep the best 3–5 chunks.
- Consider a short “pre-prompt” to the user (“Which product/version?”) when the query is ambiguous.
Draft a prompt template. Include: role (“You are a company assistant”), constraints (“Cite your sources; if unknown, say you don’t know”), format (“Answer in bullets; max 150 words unless asked”), and a context window where you paste retrieved chunks (title, source, date) so the model can cite cleanly.
Build the quality loop.
- Offline: Collect 50–150 real questions and gold-standard, grounded answers with the correct citations.
- Metrics: Retrieval recall@k, precision@k, answer groundedness, citation correctness, and format adherence.
- Online: Thumbs up/down, user comments, fallbacks (“I don’t know” counts), and drift checks. A formal quality evaluation subsystem—as called out in cloud reference designs—keeps this from being an afterthought (see Google’s RAG architecture linked above).
Governance and security.
- Enforce permissions at index time (don’t store secrets in plaintext) and query time (filter by user access).
- Maintain an audit trail of who asked what, which chunks were retrieved, and which sources were cited.
- Use org-level frameworks and tools—see primers on AI governance tools and AI data governance.
Ship a v1, then iterate weekly.
- Tweak chunk sizes and overlaps; add missing metadata; tighten top-k; improve prompts.
- Align data movement and monitoring with your ETL pipeline schedules and operational alerts.

Quality, evaluation, and monitoring

Great RAG isn’t just a model—it’s a system that continuously measures and improves itself. Treat quality as a first-class area of work (and, ideally, a distinct subsystem)—this matches cloud reference architectures that put quality evaluation alongside ingestion and serving (see Google’s reference linked above).

Offline evaluation (before you launch):

Build a tiny “gold” set of 50–150 Q&A pairs representative of your use case.
Label the retrieval step: how often does the right chunk appear in top_k (recall@k) and how cleanly do filters narrow to the right domain (precision@k)?
Label the answer step: is the answer grounded (only claims supported by retrieved text), are citations correct, is the format usable?
Use this set to compare prompt versions, re-ranking strategies, chunk sizes, and embedding models.

Online monitoring (after you launch):

Capture thumbs/flags and attach them to specific retrieved chunks and prompts.
Detect drift: sudden drops in recall or groundedness, unusual token spikes, or increased “I don’t know” rates after a re-index.
Alert on low confidence cases (long answers with weak citations).
Log all RAG decisions (query, candidates, final chunks, prompt, output) so you can reproduce issues.

Guardrails:

Hard cap max tokens and answer length by policy.
Require a refusal (“I don’t know”) when no relevant evidence is retrieved.
Enforce “cite at least one source” in your prompt; when multiple chunks are similar, prefer the most recent.

Cost, scalability, and performance

RAG can be fast and affordable with a few architectural choices:

ANN indexes (approximate nearest neighbor) keep vector search fast as corpora grow.
Hybrid search narrows candidates before expensive re-ranking.
Response caching (question and retrieved context hash) eliminates repeat work for FAQs.
Chunk tuning can trim context by 30–50%: favor small, well-titled chunks over sprawling blocks.
Batch re-embedding and cooldowns (e.g., re-embed a document only after it’s been stable for 24 hours) avoid unnecessary costs.
Observability: log latency for retrieval, re-ranking, and generation; correlate spend with business KPIs (your API logs and ops data will help—even a simple export via API integration is enough to start).

Governance, security, and compliance

Two principles keep RAG enterprise-ready:

Least privilege. Users should never retrieve content they cannot otherwise access. Apply row-/document-level permissions inside your index metadata and filter at query time.
Auditability. You must be able to show which sources supported an answer. Store document IDs and versions in retrieved chunks. Keep immutable logs of prompts, retrieved items, and outputs.

Practical steps:

Encrypt at rest and in transit; classify sensitive fields; redact PII in transformations.
Respect data residency and retention policies.
Use deny lists (e.g., never surface “draft” or “legal-privileged” docs).
Align to corporate frameworks and industry guidance—use primers on AI data governance, AI governance tools, and confirm you’ve implemented data security management controls.

Tooling landscape (what to consider)

You’ll hear about orchestration frameworks (LangChain, LlamaIndex, Haystack), vector databases, and cloud building blocks. Rather than prescribing a brand, evaluate tools on:

Connectors & ingestion: Can you integrate sources with minimal friction (CMS, ticketing, drive, database)? Strong data integration tools make or break the first mile.
Metadata & permissions: Rich tagging + filter support; can you map your IAM/roles into the index?
Hybrid retrieval & re-ranking: Out-of-the-box, or do you have to bolt it on?
Observability & eval: Built-in metrics for recall@k, groundedness, and drift?
Governance hooks: Prompt/response logging, PII policies, and export for audits.

Cloud providers also publish reference architectures showing how to split responsibilities across ingestion, serving, and quality evaluation subsystems.

Common pitfalls (and fast fixes)

Over- or under-chunking. If answers cite the wrong paragraph or include fluff, adjust chunk sizes and overlaps; add clearer section titles.
No metadata. Without tags (owner, product, version, region, access), retrieval turns mushy. Enrich during ingestion.
Mismatched embeddings. Use the same embedding model for documents and queries to avoid degraded matching.
“It makes stuff up.” Require citations, use a strict prompt, reduce top-k to the truly relevant, and add an explicit “I don’t know” path.
Stale index. Schedule re-indexing, track document versions, and monitor drift.
Governance late in the game. Bake in access controls and audits from day one.

FAQ

RAG vs. fine-tuning—how do I choose?
Start with RAG when answers depend on your evolving content or when you need citations and tight governance. Consider fine-tuning when the desired skill is stable (e.g., style or format) and not tied to specific sources.

How many documents do I need to start?
You can build a meaningful v1 with 50-500 high-value docs. Focus on your top 50 questions and the pages people already use to answer them.

Can I do RAG on private data safely?
Yes—if you enforce least-privilege access, encrypt at rest and in transit, redact sensitive fields in ingestion, and log retrieval/citations for audit. Align controls with your data warehouse best practices.

What KPIs prove it’s working?
Retrieval recall@k, answer groundedness, citation correctness, handle time, and business outcomes (deflection, conversion, MTTR).

Put this blueprint to work with Domo

You don’t need to stitch together a dozen tools to get real value from RAG. With Domo, you can connect sources, govern data, and monitor AI quality and cost—all in one place.

What you’ll see in a Domo walkthrough:

Fast connections to your content (docs, wikis, ticketing, product data) with governed pipelines—no duct tape.
Operational hygiene out of the box: scheduled refresh, lineage, version awareness, and role-based access so only the right people see the right answers.
RAG health dashboards that track recall@k, groundedness, citation coverage, latency, and cost per question—so you can prove value and control spend.
Governance and audits that keep AI answers compliant (PII handling, access logs, retention), backed by enterprise-grade controls.
Workflow integration so answers show up where work happens—inside apps, alerts, and team dashboards.

Ready to see it in your environment?
Book a personalized RAG walkthrough. We’ll map your first use case and KPI, connect sample sources, outline your indexing/retrieval strategy, and show how to monitor quality and cost—end-to-end with Domo.

Let’s build your first grounded AI assistant—together.

‍

Table of contents

Example H2

Try Domo for yourself.

Try free

Explore all

RAG Pipelines: Examples, Process, and How to Build (For Business & Data Newcomers)

What is RAG?

Why RAG matters now

Core components of a RAG pipeline

How a RAG pipeline works end-to-end