Building Fast Vector Search with FAISS and HNSW

Vector search is at the heart of modern AI applications — from semantic search and recommendation systems to RAG (Retrieval-Augmented Generation) pipelines. My VectorWise project was born from the need to understand and implement these systems from scratch.

The Problem: Nearest Neighbor at Scale

Given a query vector, find the K most similar vectors in a database. With brute force, this is O(n*d) per query (n vectors, d dimensions). At 1 million vectors with 768 dimensions, that's ~3 billion floating-point operations per query. Not practical for real-time applications.

Approximate Nearest Neighbors (ANN)

ANN algorithms trade a small amount of accuracy for massive speed improvements. The two most important approaches:

1. FAISS (Facebook AI Similarity Search)

FAISS provides several index types. The most useful:

Flat Index (Exact)

import faiss
import numpy as np

d = 768  # dimension
n = 1_000_000  # database size

# Build index
index = faiss.IndexFlatL2(d)
index.add(vectors)  # vectors: np.array of shape (n, d)

# Search
distances, indices = index.search(query_vector, k=10)

IVF (Inverted File Index)

nlist = 100  # number of clusters
quantizer = faiss.IndexFlatL2(d)
index = faiss.IndexIVFFlat(quantizer, d, nlist)
index.train(vectors)  # requires training step
index.add(vectors)

index.nprobe = 10  # search 10 nearest clusters
distances, indices = index.search(query_vector, k=10)

IVF + PQ (Product Quantization)

m = 8  # number of subquantizers
index = faiss.IndexIVFPQ(quantizer, d, nlist, m, 8)
index.train(vectors)
index.add(vectors)

2. HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph where each node is connected to its approximate nearest neighbors. Search starts from the top layer (sparse, long-range connections) and progressively refines through lower layers.

import hnswlib

# Initialize
index = hnswlib.Index(space="l2", dim=768)
index.init_index(max_elements=1_000_000, ef_construction=200, M=16)

# Add data
index.add_items(vectors)

# Query
index.set_ef(50)  # higher = more accurate, slower
labels, distances = index.knn_query(query_vector, k=10)

Benchmarks: FAISS vs HNSW

From my testing on 1M vectors of dimension 768:

Method	QPS (queries/sec)	Recall@10	Memory
Flat (exact)	5	100%	3.0 GB
IVF-Flat (nprobe=10)	850	95%	3.1 GB
IVF-PQ	3,200	88%	0.5 GB
HNSW (ef=50)	1,500	97%	3.8 GB
HNSW (ef=200)	600	99.5%	3.8 GB

Key observations:

HNSW offers the best accuracy/speed trade-off when memory isn't constrained
IVF-PQ is unbeatable on memory — essential when you have billions of vectors
FAISS's GPU support can push IVF to 10,000+ QPS

Choosing the Right Index

Decision framework:

< 100K vectors: Just use flat index. It's fast enough.
100K - 10M vectors: HNSW for best recall, IVF-Flat for good balance
> 10M vectors: IVF-PQ or FAISS on GPU
Memory constrained: Product Quantization (PQ) variants

Practical Considerations

Dimensionality reduction: PCA from 768 to 256 dimensions often loses minimal accuracy but speeds up everything
Batch queries: Always batch when possible — FAISS is heavily optimized for batch operations
Index serialization: FAISS supports write_index/read_index for persistence; HNSW has save_index/load_index
Filtering: Neither FAISS nor hnswlib natively supports metadata filtering. Consider hybrid approaches with pre-filtering or post-filtering

Conclusion

Vector search is a solved problem in theory but a nuanced one in practice. The right index depends on your dataset size, dimensionality, memory budget, and accuracy requirements. Start with exact search, benchmark your requirements, and add complexity only when needed.