Building Fast Vector Search with FAISS and HNSW
A deep dive into approximate nearest neighbor search, comparing FAISS and HNSW for building scalable vector search systems.
Building Fast Vector Search with FAISS and HNSW
Vector search is at the heart of modern AI applications — from semantic search and recommendation systems to RAG (Retrieval-Augmented Generation) pipelines. My VectorWise project was born from the need to understand and implement these systems from scratch.
The Problem: Nearest Neighbor at Scale
Given a query vector, find the K most similar vectors in a database. With brute force, this is O(n*d) per query (n vectors, d dimensions). At 1 million vectors with 768 dimensions, that's ~3 billion floating-point operations per query. Not practical for real-time applications.
Approximate Nearest Neighbors (ANN)
ANN algorithms trade a small amount of accuracy for massive speed improvements. The two most important approaches:
1. FAISS (Facebook AI Similarity Search)
FAISS provides several index types. The most useful:
Flat Index (Exact)
import faiss
import numpy as np
d = 768 # dimension
n = 1_000_000 # database size
# Build index
index = faiss.IndexFlatL2(d)
index.add(vectors) # vectors: np.array of shape (n, d)
# Search
distances, indices = index.search(query_vector, k=10)
IVF (Inverted File Index)
nlist = 100 # number of clusters
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)
index.train(vectors) # requires training step
index.add(vectors)
index.nprobe = 10 # search 10 nearest clusters
distances, indices = index.search(query_vector, k=10)
IVF + PQ (Product Quantization)
m = 8 # number of subquantizers
index = faiss.IndexIVFPQ(quantizer, d, nlist, m, 8)
index.train(vectors)
index.add(vectors)
2. HNSW (Hierarchical Navigable Small World)
HNSW builds a multi-layer graph where each node is connected to its approximate nearest neighbors. Search starts from the top layer (sparse, long-range connections) and progressively refines through lower layers.
import hnswlib
# Initialize
index = hnswlib.Index(space="l2", dim=768)
index.init_index(max_elements=1_000_000, ef_construction=200, M=16)
# Add data
index.add_items(vectors)
# Query
index.set_ef(50) # higher = more accurate, slower
labels, distances = index.knn_query(query_vector, k=10)
Benchmarks: FAISS vs HNSW
From my testing on 1M vectors of dimension 768:
| Method | QPS (queries/sec) | Recall@10 | Memory |
|---|---|---|---|
| Flat (exact) | 5 | 100% | 3.0 GB |
| IVF-Flat (nprobe=10) | 850 | 95% | 3.1 GB |
| IVF-PQ | 3,200 | 88% | 0.5 GB |
| HNSW (ef=50) | 1,500 | 97% | 3.8 GB |
| HNSW (ef=200) | 600 | 99.5% | 3.8 GB |
Key observations:
- HNSW offers the best accuracy/speed trade-off when memory isn't constrained
- IVF-PQ is unbeatable on memory — essential when you have billions of vectors
- FAISS's GPU support can push IVF to 10,000+ QPS
Choosing the Right Index
Decision framework:
- < 100K vectors: Just use flat index. It's fast enough.
- 100K - 10M vectors: HNSW for best recall, IVF-Flat for good balance
- > 10M vectors: IVF-PQ or FAISS on GPU
- Memory constrained: Product Quantization (PQ) variants
Practical Considerations
- Dimensionality reduction: PCA from 768 to 256 dimensions often loses minimal accuracy but speeds up everything
- Batch queries: Always batch when possible — FAISS is heavily optimized for batch operations
- Index serialization: FAISS supports
write_index/read_indexfor persistence; HNSW hassave_index/load_index - Filtering: Neither FAISS nor hnswlib natively supports metadata filtering. Consider hybrid approaches with pre-filtering or post-filtering
Conclusion
Vector search is a solved problem in theory but a nuanced one in practice. The right index depends on your dataset size, dimensionality, memory budget, and accuracy requirements. Start with exact search, benchmark your requirements, and add complexity only when needed.