This project demonstrates vector search capabilities using Azure DocumentDB with TypeScript/Node.js. It includes implementations of three different vector index types: DiskANN, HNSW, and IVF, along with utilities for embedding generation and data management.
Vector search enables semantic similarity searching by converting text into high-dimensional vector representations (embeddings) and finding the most similar vectors in the database. This project shows how to:
- Generate embeddings using Azure OpenAI
- Store vectors in DocumentDB
- Create and use different types of vector indexes
- Perform similarity searches with various algorithms
Before running this project, you need:
- Azure subscription with appropriate permissions
- Azure Developer CLI (azd) installed
- Node.js 22 or higher (tested with Node.js v22.14.0)
- npm (comes with Node.js)
- Git (for cloning the repository)
- Visual Studio Code (recommended) or another code editor
# Clone this repository
git clone https://github.com/Azure-Samples/documentdb-samplesThis project uses Azure Developer CLI (azd) to deploy all required Azure resources from the existing infrastructure-as-code files.
If you haven't already, install the Azure Developer CLI:
Windows:
winget install microsoft.azdmacOS:
brew tap azure/azd && brew install azdLinux:
curl -fsSL https://aka.ms/install-azd.sh | bashNavigate to the root of the repository (two directories up) and run:
# Login to Azure
azd auth login
# Provision Azure resources
azd upDuring provisioning, you'll be prompted for:
- Environment name: A unique name for your deployment (e.g., "my-vector-search")
- Azure subscription: Select your Azure subscription
- Location: Choose from
eastus2orswedencentral(required for OpenAI models)
The azd provision command will:
- Create a resource group
- Deploy Azure OpenAI with text-embedding-3-small model
- Deploy Azure DocumentDB (MongoDB vCore) cluster
- Create a managed identity for secure access
- Configure all necessary permissions and networking
- Generate a
.envfile with all connection information at the root
# move to TypeScript vector search project
cd ai/vector-search-typescript
# Install dependencies
npm installAfter deployment completes, verify that the .env file was created in the repository root:
# View the generated environment variables
cat ../../.envThe file should contain all necessary configuration including:
- Azure OpenAI endpoint and model information
- DocumentDB cluster name and database settings
- Embedding and data processing configuration
Compile the TypeScript code before running:
npm run buildThis compiles the TypeScript source files in src/ to JavaScript in dist/.
The project includes several scripts that demonstrate different aspects of vector search:
az login
Run DiskANN (Disk-based Approximate Nearest Neighbor) search:
npm run start:diskannDiskANN is optimized for:
- Large datasets that don't fit in memory
- Efficient disk-based storage
- Good balance of speed and accuracy
Run HNSW (Hierarchical Navigable Small World) search:
npm run start:hnswHNSW provides:
- Excellent search performance
- High recall rates
- Hierarchical graph structure
- Good for real-time applications
Run IVF (Inverted File) search:
npm run start:ivfIVF features:
- Clusters vectors by similarity
- Fast search through cluster centroids
- Configurable accuracy vs speed trade-offs
- Efficient for large vector datasets
- Azure Developer CLI Documentation
- Azure DocumentDB Documentation
- Azure OpenAI Service Documentation
- Vector Search in DocumentDB
- MongoDB Node.js Driver Documentation
- TypeScript Documentation
If you encounter issues:
- Check the troubleshooting section above
- Review Azure resource configurations
- Verify environment variable settings
- Check Azure service status and quotas
- Ensure your Node.js version is compatible (18+)