What are the three core steps of RAG?

RAG embeds the user query, retrieves semantically similar document chunks from a vector store, then passes those chunks to the LLM as context for generating a grounded response.

Module 2Lesson 1

Embeddings & Vector Stores

Understand how text becomes searchable vectors and how vector databases power semantic search.

8 min read

2 quiz questions2 templates

An embedding is a list of numbers (a vector) that captures the meaning of a piece of text. Similar texts have similar vectors. This lets you search by meaning rather than keywords — "how to fix a flat tire" would match "tire puncture repair guide" even though they share few words.

Embedding models (like OpenAI's text-embedding-3-small or Cohere's embed-v3) convert any text into a fixed-length vector, typically 256–3072 dimensions. The distance between vectors indicates semantic similarity.

A vector store (or vector database) indexes these embeddings for fast similarity search. When a user asks a question, you embed the query, find the most similar stored vectors, and retrieve the associated text chunks. Popular options include Pinecone, Weaviate, Chroma, and pgvector (Postgres extension).

Pinecone: Managed cloud service, easy to start, scales well
Chroma: Open-source, embeds in your app, great for prototyping
pgvector: Postgres extension — use your existing database
Weaviate: Open-source with hybrid keyword + vector search

RAG (Retrieval-Augmented Generation) works in three steps: (1) embed the user's question, (2) retrieve the most relevant document chunks from your vector store, (3) include those chunks in the prompt so the model answers using your data. This gives the model access to information it was never trained on.

Embedding quality directly impacts RAG quality. If your embeddings don't capture the right meaning, retrieval returns irrelevant chunks, and the model generates hallucinated answers based on bad context.

Prompt Templates

RAG Answer with Sources

Standard RAG generation prompt that enforces grounding and source citation.

Answer the user's question using ONLY the provided context. If the context doesn't contain enough information, say "I don't have enough information to answer this."

Context:
[RETRIEVED CHUNKS]

Question: [USER QUESTION]

Provide your answer and cite which context passages you used.

RAG Query Expansion

Improves retrieval recall by searching with multiple query variations.

The user asked: "[USER QUESTION]"

Generate 3 alternative phrasings of this question that might retrieve different relevant documents. Vary vocabulary, specificity, and angle.

1.
2.
3.

Test Your Knowledge

Knowledge Check

1 / 2

What is a text embedding?

Key Takeaways

✓Embeddings convert text into numerical vectors that capture semantic meaning, enabling search by meaning
✓Vector stores index embeddings for fast similarity retrieval — Pinecone, Chroma, and pgvector are popular choices
✓RAG connects retrieval to generation: embed query → retrieve chunks → generate grounded answers

Previous Lesson Next Lesson

Continue Learning

Chunking Strategies

Learn how to split documents into chunks that maximize retrieval quality and minimize noise.

7 min

Building RAG Pipelines

Assemble end-to-end RAG systems with query routing, re-ranking, and answer synthesis.

9 min

Tree of Thoughts & Self-Consistency

Explore branching reasoning paths and majority-vote strategies to dramatically improve accuracy on hard problems.

9 min