Embeddings & Vector Stores
Understand how text becomes searchable vectors and how vector databases power semantic search.
An embedding is a list of numbers (a vector) that captures the meaning of a piece of text. Similar texts have similar vectors. This lets you search by meaning rather than keywords — "how to fix a flat tire" would match "tire puncture repair guide" even though they share few words.
Embedding models (like OpenAI's text-embedding-3-small or Cohere's embed-v3) convert any text into a fixed-length vector, typically 256–3072 dimensions. The distance between vectors indicates semantic similarity.
A vector store (or vector database) indexes these embeddings for fast similarity search. When a user asks a question, you embed the query, find the most similar stored vectors, and retrieve the associated text chunks. Popular options include Pinecone, Weaviate, Chroma, and pgvector (Postgres extension).
- Pinecone: Managed cloud service, easy to start, scales well
- Chroma: Open-source, embeds in your app, great for prototyping
- pgvector: Postgres extension — use your existing database
- Weaviate: Open-source with hybrid keyword + vector search
RAG (Retrieval-Augmented Generation) works in three steps: (1) embed the user's question, (2) retrieve the most relevant document chunks from your vector store, (3) include those chunks in the prompt so the model answers using your data. This gives the model access to information it was never trained on.
Prompt Templates
RAG Answer with Sources
Standard RAG generation prompt that enforces grounding and source citation.
Answer the user's question using ONLY the provided context. If the context doesn't contain enough information, say "I don't have enough information to answer this." Context: [RETRIEVED CHUNKS] Question: [USER QUESTION] Provide your answer and cite which context passages you used.
RAG Query Expansion
Improves retrieval recall by searching with multiple query variations.
The user asked: "[USER QUESTION]" Generate 3 alternative phrasings of this question that might retrieve different relevant documents. Vary vocabulary, specificity, and angle. 1. 2. 3.
Test Your Knowledge
Knowledge Check
1 / 2
What is a text embedding?
Key Takeaways
- ✓Embeddings convert text into numerical vectors that capture semantic meaning, enabling search by meaning
- ✓Vector stores index embeddings for fast similarity retrieval — Pinecone, Chroma, and pgvector are popular choices
- ✓RAG connects retrieval to generation: embed query → retrieve chunks → generate grounded answers
Continue Learning
Chunking Strategies
Learn how to split documents into chunks that maximize retrieval quality and minimize noise.
Building RAG Pipelines
Assemble end-to-end RAG systems with query routing, re-ranking, and answer synthesis.
Tree of Thoughts & Self-Consistency
Explore branching reasoning paths and majority-vote strategies to dramatically improve accuracy on hard problems.