benchANT Homepage
benchANT Homepage

To Benchmark Vector Databases or to Get Sued for Breaching a DeWitt clause?

Vector databases are currently the hot topic in the database market. And with every new type of database there always comes the challenge for the fastest database. But what about performance benchmarking of vector databases under the shadow of the DeWitt clause?

The database market is consistently evolving to address the needs of modern data-intensive applications. Vector databases are the latest trend in the database market, addressing the needs of AI applications through dedicated vector search features.

The landscape of vector databases has started to grow increasingly fast over the last two years, bringing up native vector databases such as Pinecone, Milvus, or Weaviate and vector extensions to established general-purpose databases such as pg_vector for PostgreSQL, MongoDB Atlas Vector Search, or Couchbase Vector Search.

In order to find the optimal vector database for your data-intensive AI application, the solution is easy, pick one because they all promise to be the perfect match πŸ˜‰

For a more sustainable approach, a closer look into the actual features, maturity, and operational models is required. For this, the Vector DB Comparison provides a great high-level overview. A second and equally important aspect for all database systems is performance and the price-performance ratio that the database provides for AI applications.

In order to compare the performance of a database system, the execution of one or multiple database benchmarks is required. And because the benchmark landscape evolves together with the database landscape, there are many benchmarking suites for different application domains available, recently extended by vector benchmark suites, such as VectorDBBench and ANN Benchmark.

In theory, now we have all the concepts and tooling at hand to perform benchmarking studies to answer questions such as "Is the performance of my general-purpose database with vector search support sufficient to handle the demands of my AI application?" or "Which bleeding edge vector database provides the best price-performance ration?"

So, let's start benchmarking. Yet, behold: DeWitt clauses everywhere 😑

In a nutshell, the naming of the DeWitt clause goes back to David DeWitt, a researcher at the Department of Computer Sciences at the University of Wisconsin-Madison and one of the creators of the Wisconsin Benchmark. Today, database providers with a DeWitt clause legally forbid benchmarking by their service terms. In particular, the DeWitt clause either forbids publishing benchmark results or even conducting any benchmarks for their products. In consequence, the user will have to buy a pig in a poke because no performance data is publicly available nor can any application-specific benchmarks be conducted. For more details on the history of the DeWitt clause and an overview of general-purpose databases and their DeWitt clauses see the great blog post DeWitt clause, or Can you benchmark a database and get away with it

Since benchmarking is our daily business at benchANT and we have seen an increasing interest in vector database benchmarks over the last months, we want to share an overview on the vector database market concerning the DeWitt clause. The following table shows a selection of popular native vector databases and general-purpose databases with vector support and if they have a DeWitt clause that forbids performance benchmarks and their publication. The table shows the analysis for the open source or community edition as well as for the database-as-a-service (DBaaS) offer. Especially for PostgreSQL, there are many more DBaaS offers available that support pg_vector that are not reflected in this table. The same goes for MongoDB.

Table 1: Providers and technology selected for this evaluation
DatabaseVector SupportBenchmarking OS/CEBenchmarking DBaaS
ChromaDBnativeβœ… OSnot yet available
Couchbaseextension(βœ…)Β  CE without vector search supportβ›” terms
Datastax AstraDBextensionn/aβœ…
ElasticSearchextension(βœ…) CE withoutΒ  vector search supportβ›” terms
Milvusnativeβœ…βœ… (Zilliz Cloud)
MongoDBextension(βœ…) CE without vector search supportβœ…
Pineconevectorn/aβ›” terms
PostgreSQLextensionβœ…βœ… AWS RDS PostgreSQL
βœ… Azure Database for PostgreSQL
Redisextension(βœ…) CE without vector search supportβœ…
Rocksetnativen/aβ›” terms
SingleStoreextensionβœ…β›” terms

Disclaimer: The analysis has been carried out in April 2024 and the terms can change in the future.

In summary, 4 out of the 13 analyzed databases with vector search support forbid the publication of benchmark results or even conducting benchmarks. All of them only provide the vector search support in their DBaaS offer and they do not even provide an open source or community edition or it does not include the vector search support. This might be an indication that their performance is not (yet) as good as their marketing claims. Equally, it is not surprising that the open-source databases do not apply a DeWitt clause for their DBaaS offers.

πŸ’‘ We will follow the development of the DeWitt clause in the vector database landscape and keep you posted on any updates. And stay tuned for upcoming vector database benchmark results by benchANT.