Vector Search Performance Benchmark Comparing SingleStore, Pinecone and Zilliz
Introduction
The rise of AI applications such as ChatGPT and the technical concepts of Large Language Models (LLM), semantic search and Retrieval-Augmented Generation (RAG) have driven the evolution of a new database category, vector databases.
Vector databases provide dedicated vector indexing and vector search capabilities to build many kinds of AI applications. Vector databases can be classified into two types, (i) native vector databases such as Zilliz or Pinecone that are built from scratch for vector processing and (ii) general purpose databases with vector extensions such as SingleStore, PostgreSQL, and MongoDB.
When it comes to selecting the optimal database for your target AI application there are many aspects to be considered. Yet, we as benchANT are all into performance, so that this blog post is focussing on the performance capabilities of native vector databases in comparison to general purpose databases with vector search extensions.
With the rising attention in vector databases, several performance benchmarks of vector databases have been published over the last year, above all the vector search performance ranking by Zilliz.
Over the last months, we have extended the benchANT platform with support for VectorDBBench maintained by Zilliz, a benchmarking suite tailored towards AI workloads. In our recent benchmarking study, we look into the performance of the two dedicated vector databases Zilliz and Pinecone in comparison with the general purpose database, SingleStore, with its vector search extension. The benchmarking study uses the 10M Cohere data set and analyses the performance of the target databases for two cluster sizes.
VectorDBBench
In order to measure vector search performance, a benchmark suite that simulates realistic vector search use cases is required. With the rise of vector databases, several open source vector search benchmark suites have been established. The following performance measurements are carried out with a fork of the widely used VectorDBBench benchmark suite which was originally created by Zilliz.
VectorDBBench supports two modes of operation, capacity and performance. In the first case, VectorDBBench iteratively loads an ever increasing number of data sets / rows into the database until a pre-defined insert timeout threshold is violated. In the latter case, it evaluates a sequence of four different steps each of which produces its own set of metrics / results.
This benchmarking study applies the performance mode of operation so that we detail the four steps:
- Load: During this step all data from the data set is ingested into the database. In all bindings provided by Zilliz Inc., this is done with a single client (1 thread / process) in batches of a configurable size.
- Optimize: During the optimize step, the binding may perform an arbitrary set of steps to optimize the database for the following query phase. This may include flushing segments, re-building the index, and other steps. Different bindings (databases) apply different means of optimization during this step.
- Concurrent search: In this step, an increasing number of clients (threads / processes) query the database for vectors leading to kNN or ANN queries. For some datasets the vector search is extended with a filtering step. VectorDBBench version 0.0.9, which was used for this study, uses (hard-coded) 1, 5, 10, 15, 20, 25, 30, and 35 parallel clients and (hard-coded) k=100 for nearest neighbor search. For each level of parallelism each client queries the database as fast as possible for (hard-coded) 30 seconds. For each set of clients, the benchmark reports the absolute number of queries completed as well as the queries per second (QPS) over all clients. At the end of this step, VectorDBBench reports QPS_max, the maximum achieved QPS. The latest version (0.0.18) of VectorDBBench allows configuration of the hard-coded parameters.
- Serial search: In this final step VectorDBBench uses a single client to iteratively send 1,000 requests to the database using the same parameters as in “Concurrent search”. This time, the benchmark does not determine QPS, but rather query latency and recall of the results. This step reports the average latency over all 1,000 queries in milliseconds and the 99th percentile of latency also in milliseconds.
Based on these phases, VectorDBBench reports as a result a set of metrics that is a subset of the metrics of the individual steps.
It is important to note that because of the way VectorDBBench works and because it separates concurrent and serial search, the reported QPS and latencies cannot be put in relation. This is because QPS_max is computed over a set of different concurrency levels while the latencies are computed over 1,000 queries issued by a single user. That is, we know nothing about latencies under different load levels.
Benchmark Methodology
In order to ensure a fair comparison by comparable production-ready database deployments , all benchmarks were carried out on the database vendors’ DBaaS offers, namely SingleStore Helios, Pinecone, and Zilliz Cloud.
Further, the benchmarking tool used was a version of VectorDBBench extended to support SingleStore. The benchmark suite and all applied configurations are exposed and published with the benchmark results on GitHub. As a consequence, the interested reader can reproduce the results on their own even without the benchANT platform.
The benchmark process was automated with the benchANT platform which integrates benchANT’s VectorDBBench fork. VectorDBBench takes care of preparing and executing the benchmark as well as of reporting the benchmark results. The benchANT platform orchestrates the DBaaS allocation, cloud resource allocation for VectorDBBench instances and triggers the benchmark execution. After each benchmark has completed, the platform collects results and releases cloud resources that are no longer needed.
In order to integrate VectorDBBench into the benchANT platform and to support benchmarking SingleStore, VectorDBBench has been extended as follows:
SingleStore Support: VectorDBBench comes with the support for various native vector databases and general purpose databases with vector search support. However, SingleStore was not yet supported by VectorDBench. In consequence, we extended it in our fork to support SingleStore based on a binding that was implemented by SingleStore in the scope of a research paper published at the VLDB 2024 conference.
Our fork is based on VectorDBBench version 0.0.9. In addition, VectorDBBench was extended to specify when to create the index, (i) at the start of the vector ingestion phase on an empty database or (ii) during the optimize phase on a ingested vector data set
benchANT Platform Integration: In order to integrate VectorDBBench into the benchANT platform and enable the automated benchmark execution, VectorDBBench is extended with a CLI interface to its existing web interface.
The benchANT VectorDBBench fork is publicly available on GitHub. It is noteworthy that we planned to feed back these extensions to the original VectorDBBench but its active community had already implemented the CLI interface and configurable index creation time. It is still planned to feed back the SingleStore extension to the main VectorDBench.
Benchmark Configurations
The goal of this performance study is to ensure a fair and transparent setup for all considered vector databases. The baseline system sizes for the comparison are the SingleStore Helios S-2 and S-4 clusters. For Pinecone, comparably priced cluster sizes are selected. For Zilliz, the Zilliz (recommended) is selected based on the Zilliz Cloud Cost Calculator for the target workload of 10M vectors with a dimension of 768. The Zilliz (S4 price equal) configuration is selected to be price equal with SingleStore Helios S-4.
All benchmarks are based on the Cohere data set with 10M vectors of 768 dimensions. It is noteworthy that this benchmarking study only focuses on the HNSW index type, while SingleStore, Pinecone, and Zilliz also support additional index types.
All database clusters are deployed on the vendors' DBaaS offers to ensure a production grade and comparable deployments. For the native vector databases Pinecone and Zilliz, no additional database configurations are applied. For the general purpose database SingleStore, custom configurations are applied for optimal vector search performance as described in the following table.
For each benchmark run, VectorDBBench is deployed on a dedicated VM in the same region as the target DBaaS. Using VM type c5.4xlarge
with 16 vCores and 32 GB RAM, the VM is selected large enough to ensure VectorDBBench does not become a bottleneck. This is validated by the collected system monitoring data by the benchANT platform.
The following tables show the most relevant DBaaS and VectorDBBench parameters. All parameters are available in the associated GitHub repository.
DBaaS | Cloud Details | DBaaS cluster Details | Monthly Costs | |
---|---|---|---|---|
SingleStore S2 | SingleStore Helios | AWS us-east-1 | 2 nodes total vCores: 16 total RAM: 128 GB | $5,256 |
SingleStore S4 | SingleStore Helios | AWS us-east-1 | 4 nodes total vCores: 32 total RAM: 256 GB | $10,512 |
Pineconde (S2 price equal) | Pinecone (Capacity Mode: Pods) | AWS us-east-1 | Pods: 3 Replicas: 4 Pod Size: p2.x4 | $5,834 |
Pineconde (S4 price equal) | Pinecone (Capacity Mode: Pods) | AWS us-east-1 | Pods: 3 Replicas: 4 Pod Size: p2.x8 | $11,668 |
Zilliz (recommended) | Zilliz Cloud (Dedicated Enterprise) | AWS us-east-1 | CU Size: 8 CU Type: Performance-Optimized | $1,430 |
Zilliz (S4 price equal) | Zilliz Cloud (Dedicated Enterprise) | AWS us-east-1 | CU Size: 44 CU Type: Performance-Optimized | $10,912 |
Database Configuration | Index Configuration | Index Creation | |
---|---|---|---|
SingleStore S2 | Partition_count = 2 Query_parallelism_per_leaf_core = 0.01 Columnstore_segment_rows = 2501000 columnstore_max_blobsize = 10737418240 | Type=HNSW EFConstruction=120 EF=120 M=12 | optimize phase (post load) |
SingleStore S4 | Partition_count = 4 Query_parallelism_per_leaf_core = 0.01 Columnstore_segment_rows = 2501000 columnstore_max_blobsize = 10737418240 | Type=HNSW EFConstruction=120 EF=120 M=12 | optimize phase (post load) |
Pineconde (S2 price equal) | n/a | Type=Dense | load phase (optimize phase not supported) |
Pineconde (S4 price equal) | n/a | Type=Dense | load phase (optimize phase not supported) |
Zilliz (recommended) | n/a | Type=HNSW EFConstruction=120 EF=120 M=12 | optimize phase (post load) |
Zilliz (S4 price equal) | n/a | Type=HNSW EFConstruction=120 EF=120 M=12 | optimize phase (post load) |
Results
In the following sections, we present the results for the four measured performance metrics: index creation time, queries per second, latency, and recall.
Load and Index Creation Time
The following chart shows the time required to load 10M vectors and to create the index.
For SingleStore and Zilliz, the HNSW index is created in the optimize phase after the 10M vectors are ingested. In consequence, the lower part of the bar shows the load time and the upper part of the bar shows the time to create the index during the optimize phase.
For Pinecone, it is not possible to create the index at a later stage and in consequence, the index is created during the load phase.
The results show that SingleStore provides the lowest load times for both cluster sizes and the total load time and index creation time are on the same level as Zilliz. Pinecone shows the longest load and index creation time where the bigger cluster does not provide any performance advantage for the load and index creation time.
It is noteworthy that for all benchmarks the batch size was set to 250. Depending on the target database, higher batch sizes might further improve the load time. Maximizing the load performance was out of scope for this benchmarking project.
Throughput: Queries per Second
The maximum QPS results show that Zilliz provides the highest QPS for both cluster sizes. SingleStore provides the second best QPS for both cluster sizes, followed by Pinecone. As expected, the larger clusters show a higher QPS than the smaller clusters.
Latency
The following chart shows the P99 latency for both cluster sizes per database. While for SingleStore, the P99 latency decreases for the larger cluster, it is unexpected that for Pinecone the P99 latency nearly triples from the Pinecone (S2 price equal) setup to the Pinecone (S4 price equal) setup, especially as both setups apply the same cluster size where Pinecone (S4 price equal) uses a bigger pod size. In order to further analyse this behaviour, additional benchmarks would be required which were out of scope for this study.
Zilliz shows the lowest P99 latency while also here the P99 latency increases for the larger cluster size. However, it is to be expected as the Zilliz (S4 price equal) cluster is comparably oversized for the 10M data set, resulting in additional network communication.
Recall
The recall results show that all three databases provide comparable recall in a range of 88,8% (Zilliz) to 91,5% (Pinecone).
Conclusion
In this benchmark study, we compare the vector search performance of the general purpose database SingleStore with the native vector databases Pinecone and Zilliz by applying the VectorDBBench benchmark suite.
The benchmark setup targets the DBaaS offers of each database technology to ensure comparable and production grade database deployments. For each database technology, two cluster sizes in a comparable price range are selected where the SingleStore Helios clusters S2 and S4 set the baseline. The applied Cohere data set comprises 10M vectors of 768 dimensions.
These results show that the general purpose database SingleStore can compete with native vector databases for the applied benchmark setup.
As for any benchmarking study, the carried out benchmarks only address a fraction of the overall vector search benchmarking space. Benchmarking larger vector data sets of >100M vectors with varying dimensions, analysing the performance impact of different index types or analysing the performance of higher concurrency levels are subject to future work.
In order to ensure full transparency and also reproducibility of the presented results, all benchmark results are publicly available on GitHub. This data contains the raw performance measurements as well as additional metadata such DBaaS instance details and VM details for running the VectorDBBench instances.
Disclaimer
This benchmarking project carried out by benchANT was sponsored by SingleStore with the goal to provide a fair, transparent, and reproducible comparison of the selected database technologies.
About SingleStore
SingleStore is a modern, distributed SQL database designed for real-time analytics and high-performance operational workloads. With its unified architecture, SingleStore supports hybrid transactional and analytical processing (HTAP), enabling organizations to run complex queries on live data at scale. Its compatibility with MySQL, combined with features like native JSON support and in-memory processing, makes it a powerful choice for demanding applications in industries ranging from finance to e-commerce. By delivering low-latency query performance and seamless scalability, SingleStore empowers developers and businesses to build fast, data-driven applications with ease.
About benchANT
benchANT is a consulting and analytics firm specializing in comparative performance analysis of database management systems with a focus on cloud hosted databases and Database-as-a-Service technologies. BenchANT provides services to database vendors, cloud providers, and end users taking the role of an unbiased analyst, researcher, and evaluator. The experiments described in this paper have been designed, executed, and written-up by two of benchANT’s key employees.
Dr. Jörg Domaschka is one of benchANT’s co-founders. He has been trying to understand distributed systems for more than two decades. Performance engineering and benchmarking help him with that task and allow educating others on his findings.
Dr. Daniel Seybold is a co-founder and the CTO of benchANT. Daniel started his career as a researcher with a focus on distributed systems and databases. He has extensive experience in the field of database performance testing and has been working with NoSQL databases such as MongoDB, Cassandra and ScyllaDB for more than a decade.
Annex
The benchmarking process carried out by benchANT's automated benchmarking platform emphasizes full transparency and reproducibility based on established scientific concepts. The following figure depicts the main technical tasks carried out by the benchmarking platform for a single benchmark run, i.e. benchmarking one defined setup. In consequence, each benchmark run is carried out on a fresh instance to avoid any caching impact on memory OS or disk level from previous runs. The benchmark runs for the same provider are scheduled sequentially to avoid creating noisy neighbors and all benchmarks are executed during regular business hours.