• Databricks
  • Databricks
  • Help Center
  • Documentation
  • Knowledge Base
  • Community
  • Support
  • Feedback
  • Try Databricks
English
  • English
  • 日本語
  • Português
Amazon Web Services
  • Microsoft Azure
  • Google Cloud Platform
Databricks on AWS

Get started

  • Get started
  • What is Databricks?
  • DatabricksIQ
  • Release notes

Load & manage data

  • Guides
  • Work with database objects
  • Connect to data sources
  • Connect to compute
  • Discover data
  • Query data
  • Ingest data
  • Work with files
  • Transform data
  • Schedule and orchestrate workflows
  • Monitor data and AI assets
  • Read with external systems
  • Share data securely

Work with data

  • Delta Live Tables
  • Structured Streaming
  • AI and machine learning
    • Tutorials
    • AI playground
    • AI functions in SQL
    • AI Gateway
    • Deploy models
    • Train models
    • Serve data for AI
      • Feature management
      • Vector Search
        • How to create and query a vector search index
        • Best practices for Mosaic AI Vector Search
    • Evaluate AI
    • Build gen AI apps
    • MLOps
    • MLflow for AI agent and ML model lifecycle
    • Gen AI model maintenance policy
    • Integrations
    • Graph and network analysis
    • Reference solutions
  • Generative AI tutorial
  • Business intelligence
  • Data warehousing
  • Notebooks
  • Delta Lake
  • Developers
  • Technology partners

Administration

  • Account and workspace administration
  • Security and compliance
  • Data governance (Unity Catalog)
  • Lakehouse architecture

Reference & resources

  • Reference
  • Resources
  • What’s coming?
  • Documentation archive

Updated Feb 18, 2025

Send us feedback

  • Documentation
  • AI and machine learning on Databricks
  • Serve data for ML and AI
  • Mosaic AI Vector Search
  • Best practices for Mosaic AI Vector Search

Best practices for Mosaic AI Vector Search

This article gives some tips for how to use Mosaic AI Vector Search most effectively.

Recommendations for optimizing latency

  • Use the service principal authorization flow to take advantage of network-optimized routes.

  • Use the latest version of the Python SDK.

  • When testing, start with a concurrency of around 16 to 32. Higher concurrency does not yield a higher throughput.

  • Use a model served with provisioned throughput (for example, bge-large-en or a fine tuned version), instead of a pay-per-token foundation model.

When to use GPUs

  • Use CPUs only for basic testing and for small datasets (up to 100s of rows).

  • For GPU compute type, Databricks recommends using GPU-small or GPU-medium.

  • For GPU compute scale-out, choosing more concurrency might improve ingestion times, but it depends on factors such as total dataset size and index metadata.

Working with images, video, or non-text data

  • Pre-compute the embeddings and use a Delta Sync Index with self-managed embeddings.

  • Don’t store binary formats such as images as metadata, as this adversely affects latency. Instead, store the path of the file as metadata.

Embedding sequence length

  • Check the embedding model sequence length to make sure documents are not being truncated. For example, BGE supports a context of 512 tokens. For longer context requirements, use gte-large-en-v1.5.

Use Triggered sync mode to reduce costs

  • The most cost-effective option for updating a vector search index is Triggered. Only select Continuous if you need to incrementally sync the index to changes in the source table with a latency of seconds. Both sync modes perform incremental updates – only data that has changed since the last sync is processed.


© Databricks 2025. All rights reserved. Apache, Apache Spark, Spark, and the Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Notice | Terms of Use | Modern Slavery Statement | California Privacy | Your Privacy Choices