• Databricks
  • Databricks
  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Support
  • Feedback
  • Try Databricks
English
  • English
  • 日本語
  • Português
Amazon Web Services
  • Microsoft Azure
  • Google Cloud Platform
Databricks on AWS

Get started

  • Get started
  • What is Databricks?
  • DatabricksIQ
  • Release notes

Load & manage data

  • Guides
  • Work with database objects
  • Connect to data sources
  • Connect to compute
  • Discover data
  • Query data
  • Ingest data
  • Work with files
  • Transform data
  • Schedule and orchestrate workflows
  • Monitor data and AI assets
  • Read with external systems
  • Share data securely

Work with data

  • Delta Live Tables
  • Structured Streaming
  • AI and machine learning
    • Tutorials
    • AI playground
    • AI functions in SQL
    • AI Gateway
    • Deploy models
    • Train models
    • Serve data for AI
      • Feature management
      • Vector Search
        • How to create and query a vector search index
        • Best practices for Mosaic AI Vector Search
    • Evaluate AI
    • Build gen AI apps
    • MLOps
    • MLflow for AI agent and ML model lifecycle
    • Gen AI model maintenance policy
    • Integrations
    • Graph and network analysis
    • Reference solutions
  • Generative AI tutorial
  • Business intelligence
  • Data warehousing
  • Notebooks
  • Delta Lake
  • Developers
  • Technology partners

Administration

  • Account and workspace administration
  • Security and compliance
  • Data governance (Unity Catalog)
  • Lakehouse architecture

Reference & resources

  • Reference
  • Resources
  • What’s coming?
  • Documentation archive

Updated Feb 18, 2025

Send us feedback

  • Documentation
  • AI and machine learning on Databricks
  • Serve data for ML and AI
  • Mosaic AI Vector Search

Mosaic AI Vector Search

This article gives an overview of Databricks’ vector database solution, Mosaic AI Vector Search, including what it is and how it works.

What is Mosaic AI Vector Search?

Mosaic AI Vector Search is a vector database that is built into the Databricks Data Intelligence Platform and integrated with its governance and productivity tools. A vector database is a database that is optimized to store and retrieve embeddings. Embeddings are mathematical representations of the semantic content of data, typically text or image data. Embeddings are generated by a large language model and are a key component of many GenAI applications that depend on finding documents or images that are similar to each other. Examples are RAG systems, recommender systems, and image and video recognition.

With Mosaic AI Vector Search, you create a vector search index from a Delta table. The index includes embedded data with metadata. You can then query the index using a REST API to identify the most similar vectors and return the associated documents. You can structure the index to automatically sync when the underlying Delta table is updated.

Mosaic AI Vector Search supports the following:

  • Hybrid keyword-similarity search.

  • Filtering.

  • Access control lists (ACLs) to manage vector search endpoints.

  • Sync only selected columns.

  • Save and sync generated embeddings.

How does Mosaic AI Vector Search work?

Mosaic AI Vector Search uses the Hierarchical Navigable Small World (HNSW) algorithm for its approximate nearest neighbor searches and the L2 distance distance metric to measure embedding vector similarity. If you want to use cosine similarity you need to normalize your datapoint embeddings before feeding them into vector search. When the data points are normalized, the ranking produced by L2 distance is the same as the ranking produces by cosine similarity.

Mosaic AI Vector Search also supports hybrid keyword-similarity search, which combines vector-based embedding search with traditional keyword-based search techniques. This approach matches exact words in the query while also using a vector-based similarity search to capture the semantic relationships and context of the query.

By integrating these two techniques, hybrid keyword-similarity search retrieves documents that contain not only the exact keywords but also those that are conceptually similar, providing more comprehensive and relevant search results. This method is particularly useful in RAG applications where source data has unique keywords such as SKUs or identifiers that are not well suited to pure similarity search.

For details about the API, see the Python SDK reference and Query a vector search endpoint.

Similarity search calculation

The similarity search calculation uses the following formula:

reciprocal of 1 plus the squared distance

where dist is the Euclidean distance between the query q and the index entry x:

Eucidean distance, square root of the sum of squared differences

Keyword search algorithm

Relevance scores are calculated using Okapi BM25. All text or string columns are searched, including the source text embedding and metadata columns in text or string format. The tokenization function splits at word boundaries, removes punctuation, and converts all text to lowercase.

How similarity search and keyword search are combined

The similarity search and keyword search results are combined using the Reciprocal Rank Fusion (RRF) function.

RRF rescores each document from each method using the score:

RRF equation

In the above equation, rank starts at 0, sums the scores for each document and returns the highest scoring documents.

rrf_param controls the relative importance of higher-ranked and lower-ranked documents. Based on the literature, rrf_param is set to 60.

Scores are normalized so that the highest score is 1 and the lowest score is 0 using the following equation:

normalization

Options for providing vector embeddings

To create a vector database in Databricks, you must first decide how to provide vector embeddings. Databricks supports three options:

  • Option 1: Delta Sync Index with embeddings computed by Databricks You provide a source Delta table that contains data in text format. Databricks calculates the embeddings, using a model that you specify, and optionally saves the embeddings to a table in Unity Catalog. As the Delta table is updated, the index stays synced with the Delta table.

    The following diagram illustrates the process:

    1. Calculate query embeddings. Query can include metadata filters.

    2. Perform similarity search to identify most relevant documents.

    3. Return the most relevant documents and append them to the query.

    vector database, Databricks calculates embeddings
  • Option 2: Delta Sync Index with self-managed embeddings You provide a source Delta table that contains pre-calculated embeddings. As the Delta table is updated, the index stays synced with the Delta table.

    The following diagram illustrates the process:

    1. Query consists of embeddings and can include metadata filters.

    2. Perform similarity search to identify most relevant documents. Return the most relevant documents and append them to the query.

    vector database, precalculated embeddings
  • Option 3: Direct Vector Access Index You must manually update the index using the REST API when the embeddings table changes.

    The following diagram illustrates the process:

    vector database, precalculated embeddings with no automatic sync

How to set up Mosaic AI Vector Search

To use Mosaic AI Vector Search, you must create the following:

  • A vector search endpoint. This endpoint serves the vector search index. You can query and update the endpoint using the REST API or the SDK. See Create a vector search endpoint for instructions.

    Endpoints scale up automatically to support the size of the index or the number of concurrent requests. Endpoints do not scale down automatically.

  • A vector search index. The vector search index is created from a Delta table and is optimized to provide real-time approximate nearest neighbor searches. The goal of the search is to identify documents that are similar to the query. Vector search indexes appear in and are governed by Unity Catalog. See Create a vector search index for instructions.

In addition, if you choose to have Databricks compute the embeddings, you can use a pre-configured Foundation Model APIs endpoint or create a model serving endpoint to serve the embedding model of your choice. See Pay-per-token Foundation Model APIs or Create foundation model serving endpoints for instructions.

To query the model serving endpoint, you use either the REST API or the Python SDK. Your query can define filters based on any column in the Delta table. For details, see Use filters on queries, the API reference, or the Python SDK reference.

Requirements

  • Unity Catalog enabled workspace.

  • Serverless compute enabled. For instructions, see Connect to serverless compute.

  • Source table must have Change Data Feed enabled. For instructions, see Use Delta Lake change data feed on Databricks.

  • To create a vector search index, you must have CREATE TABLE privileges on the catalog schema where the index will be created.

Permission to create and manage vector search endpoints is configured using access control lists. See Vector search endpoint ACLs.

Data protection and authentication

Databricks implements the following security controls to protect your data:

  • Every customer request to Mosaic AI Vector Search is logically isolated, authenticated, and authorized.

  • Mosaic AI Vector Search encrypts all data at rest (AES-256) and in transit (TLS 1.2+).

Mosaic AI Vector Search supports two modes of authentication:

  • Service principal token. An admin can generate a service principal token and pass it to the SDK or API. See use service principals. For production use cases, Databricks recommends using a service principal token.

    # Pass in a service principal
    vsc = VectorSearchClient(workspace_url="...",
            service_principal_client_id="...",
            service_principal_client_secret="..."
            )
    
  • Personal access token. You can use a personal access token to authenticate with Mosaic AI Vector Search. See personal access authentication token. If you use the SDK in a notebook environment, the SDK automatically generates a PAT token for authentication.

    # Pass in the PAT token
    client = VectorSearchClient(workspace_url="...", personal_access_token="...")
    

Customer Managed Keys (CMK) are supported on endpoints created on or after May 8, 2024.

Monitor usage and costs

The billable usage system table lets you monitor usage and costs associated with vector search indexes and endpoints. Here is an example query:

WITH all_vector_search_usage (
  SELECT *,
         CASE WHEN usage_metadata.endpoint_name IS NULL THEN 'ingest'
              WHEN usage_type = "STORAGE_SPACE" THEN 'storage'
              ELSE 'serving'
        END as workload_type
    FROM system.billing.usage
   WHERE billing_origin_product = 'VECTOR_SEARCH'
),
daily_dbus AS (
  SELECT workspace_id,
       cloud,
       usage_date,
       workload_type,
       usage_metadata.endpoint_name as vector_search_endpoint,
       CASE WHEN workload_type = 'serving' THEN SUM(usage_quantity)
            WHEN workload_type = 'ingest' THEN SUM(usage_quantity)
            ELSE null
            END as dbus,
       CASE WHEN workload_type = 'storage' THEN SUM(usage_quantity)
            ELSE null
            END as dsus
 FROM all_vector_search_usage
 GROUP BY all
ORDER BY 1,2,3,4,5 DESC
)
SELECT * FROM daily_dbus

For details about the contents of the billing usage table, see Billable usage system table reference. Additional queries are in the following example notebook.

Vector search system tables queries notebook

Open notebook in new tab

Resource and data size limits

The following table summarizes resource and data size limits for vector search endpoints and indexes:

Resource

Granularity

Limit

Vector search endpoints

Per workspace

100

Embeddings

Per endpoint

320,000,000

Embedding dimension

Per index

4096

Indexes

Per endpoint

50

Columns

Per index

50

Columns

Supported types: Bytes, short, integer, long, float, double, boolean, string, timestamp, date

Metadata fields

Per index

50

Index name

Per index

128 characters

The following limits apply to the creation and update of vector search indexes:

Resource

Granularity

Limit

Row size for Delta Sync Index

Per index

100KB

Embedding source column size for Delta Sync index

Per Index

32764 bytes

Bulk upsert request size limit for Direct Vector index

Per Index

10MB

Bulk delete request size limit for Direct Vector index

Per Index

10MB

The following limits apply to the query API.

Resource

Granularity

Limit

Query text length

Per query

32764 bytes

Maximum number of results returned

Per query

10,000

Limitations

  • HIPAA compliance is not available in workspaces that have a control plane in us-west-2 and a data plane in us-east-1.

  • Row and column level permissions are not supported. However, you can implement your own application level ACLs using the filter API.

Additional resources

  • Deploy Your LLM Chatbot With Retrieval Augmented Generation (RAG), Foundation Models and Vector Search.

  • How to create and query a vector search index.

  • Example notebooks


© Databricks 2025. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Notice | Terms of Use | Modern Slavery Statement | California Privacy | Your Privacy Choices