feat: add VectorChord benchmark support by R3gardless · Pull Request #745 · zilliztech/VectorDBBench

R3gardless · 2026-04-02T14:54:21Z

Summary

Add VectorChord (vchord) PostgreSQL extension as a new benchmark target
Support two index types: vchordrq (IVF + RaBitQ) and vchordg (DiskANN-style graph)
Follow existing pgvector implementation patterns

Supported Features

Index Types

vchordrq: lists, probes, epsilon, residual_quantization, spherical_centroids, build_threads, degree_of_parallelism, rerank_in_table, max_scan_tuples
vchordg: m, ef_construction, bits, ef_search, beam_search, max_scan_tuples

Quantization: vector, halfvec, rabitq8, rabitq4

Metrics: L2, IP, COSINE (operator class auto-mapped per quantization type)

PostgreSQL Tuning: max_parallel_workers, max_parallel_maintenance_workers

CLI

vectordbbench vectorchordrq --help
vectordbbench vectorchordgraph --help

References

- Introduced VectorChord as a new database type in the DB enum. - Added VCHORDRQ as a new index type in the IndexType enum.

- Introduced VectorChord client with support for embedding operations. - Added configuration classes for VectorChord settings and parameters.

Integrate VectorChordRQ into the CLI for enhanced functionality. This addition allows users to utilize VectorChord in their benchmarks seamlessly. 🚀

Updated installation instructions and added command line usage for VectorChord (vchordrq) to enhance user experience. 🚀

- Introduced VectorChordGraph command to CLI for enhanced functionality. - Added quantization and reranking options to VectorChord configurations.

R3gardless · 2026-04-02T15:17:33Z

/assign @XuanYang-cn

XuanYang-cn · 2026-04-09T02:21:43Z

@R3gardless Thanks so much for this contribution — really appreciate the work you put in! One thing to flag: we upgraded to Pydantic v2 recently, and since your PR still uses v1 syntax, merging it would break the client. Could you sync with the latest main and update the Pydantic code to v2? Feel free to ping me if anything's unclear.

Enhanced VectorChordGraph with max_scan_tuples option for better control over tuple scanning. Updated configuration and CLI to support this new feature! 🚀

Updated the user_name attribute in VectorChordConfig to directly assign the string "postgres" instead of wrapping it in SecretStr. This simplifies the configuration and ensures proper handling of the default username. 🚀

…aphConfig ✨🔧 Updated the comment for max_scan_tuples to ensure clarity on its default value and range. This enhances code readability and maintainability. 🚀

R3gardless · 2026-04-09T03:07:15Z

Hi @XuanYang-cn, I've addressed the PR review comments.

Thanks!

sre-ci-robot · 2026-04-09T06:25:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: R3gardless, XuanYang-cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [XuanYang-cn]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

XuanYang-cn

Thanks for adding VectorChord support! The overall structure follows the existing client pattern well. A few issues to address before merge:

XuanYang-cn · 2026-04-09T07:30:33Z

vectordb_bench/backend/clients/vectorchord/config.py

+
+
+class VectorChordConfig(DBConfig):
+    user_name: SecretStr = "postgres"


Critical: With pydantic v2, assigning a plain string default to a SecretStr-typed field stores it as str, not SecretStr. Then to_dict() calls self.user_name.get_secret_value() unconditionally, which will raise AttributeError: 'str' object has no attribute 'get_secret_value' whenever the default is used.

Fix: either change the default to SecretStr("postgres"), or add an isinstance guard in to_dict() like the pgvector client does:

user_str = self.user_name.get_secret_value() if isinstance(self.user_name, SecretStr) else self.user_name

Thanks! I'll apply SecretStr("postgres") as the default

XuanYang-cn · 2026-04-09T07:30:33Z

vectordb_bench/backend/clients/vectorchord/vectorchord.py

+class VectorChord(VectorDB):
+    """Use psycopg instructions"""
+
+    conn: psycopg.Connection[Any] | None = None


Important — missing thread_safe = False.

psycopg connections are not thread-safe, but VectorChord inherits the default thread_safe = True from VectorDB. The pgvector client explicitly sets thread_safe = False. Without this, MPRunner may share connections across threads, risking data corruption under concurrent benchmarks.

Suggested fix — add before the conn declaration:

thread_safe = False

XuanYang-cn · 2026-04-09T07:30:33Z

vectordb_bench/backend/clients/vectorchord/vectorchord.py

+            log.warning(f"Failed to insert data into vectorchord table ({self.table_name}), error: {e}")
+            return 0, e
+
+    def search_embedding(


Note: The filters parameter here is never passed by the benchmark framework. The framework calls prepare_filter() before the search loop to let the client pre-configure its query, but VectorChord doesn't override prepare_filter(), so _filtered_search is effectively dead code. Also supported_filter_types is not declared, defaulting to [NonFilter] only.

If filtered benchmarks are intended, prepare_filter() needs to be implemented (see pgvector's pattern). Otherwise, a code comment noting the intentional deferral would be helpful.

Thanks I implement prepare_filter and _generate_search_query function to support filtering benchmark as well.

XuanYang-cn · 2026-04-09T07:30:33Z

vectordb_bench/backend/clients/vectorchord/vectorchord.py

+            for setting_name, setting_val in session_options.items():
+                command = sql.SQL("SET {setting_name} " + "= {setting_val};").format(
+                    setting_name=sql.Identifier(setting_name),
+                    setting_val=sql.Identifier(str(setting_val)),


Nit: sql.Identifier() wraps the value in double quotes, producing SET "vchordrq.probes" = "10". For PostgreSQL GUC values, sql.Literal() (single-quoted string literal) would be more semantically correct, though double-quoted identifiers happen to work for numeric values in practice.

(This pattern is inherited from the pgvector client, so not a blocker.)

Using sql.Literal instead thanks!

XuanYang-cn · 2026-04-09T07:30:33Z

vectordb_bench/backend/clients/vectorchord/config.py

+    metric_type: MetricType | None = None
+    create_index_before_load: bool = False
+    create_index_after_load: bool = True
+    quantization_type: str = "vector"  # vector, halfvec, rabitq8, rabitq4


Suggestion: This field accepts any string but only "vector", "halfvec", "rabitq8", "rabitq4" are valid. The value flows into sql.SQL(col_type) in _create_table which bypasses escaping. Consider constraining with:

quantization_type: Literal["vector", "halfvec", "rabitq8", "rabitq4"] = "vector"

I constrained quantization type using Literal

XuanYang-cn · 2026-04-09T07:30:33Z

vectordb_bench/backend/clients/vectorchord/vectorchord.py

+        else:
+            with_clause = sql.SQL(";")
+
+        full_sql = (index_create_sql + with_clause).join(" ")


Bug: sql.Composed.join(" ") interposes a space between every internal part of the composed SQL (not just between index_create_sql and with_clause). This produces extra whitespace like ON public. "table_name". PostgreSQL tolerates extra whitespace so it likely works, but the intent seems to be joining the two halves. Consider:

full_sql = index_create_sql + sql.SQL(" ") + with_clause

Thanks for suggestion! I applied it.

XuanYang-cn · 2026-04-09T07:30:33Z

vectordb_bench/backend/clients/vectorchord/vectorchord.py

+    @staticmethod
+    def _create_connection(**kwargs) -> tuple[Connection, Cursor]:
+        conn = psycopg.connect(**kwargs)
+        conn.cursor().execute("CREATE EXTENSION IF NOT EXISTS vchord CASCADE")


Minor: This creates a temporary cursor that is never explicitly closed. Consider reusing the main cursor or closing it:

cur = conn.cursor() cur.execute("CREATE EXTENSION IF NOT EXISTS vchord CASCADE") cur.close()

I referred to the pgvector.py source code, which returns the main cursor. It will be closed in the main logic. Thanks.

…onality 🎉✨ - Updated quantization_type to use Literal for better type validation. - Refactored search methods to streamline query generation and filtering. - Added support for dynamic where clauses in search queries. 🔍

Updated the user_name initialization to use SecretStr for improved security. This change ensures that sensitive information is handled properly. 🚀

This update ensures that the vectorchord extension is created if it doesn't already exist when establishing a connection. This enhances the setup process for the VectorChord class. 🚀

Cleaned up the code by removing extra whitespace for better readability. Ensured that the connection and cursor assertions remain clear and concise.

R3gardless added 6 commits April 2, 2026 22:57

feat: add VectorChord support and VCHORDRQ index type 🎉

b7dc7d6

- Introduced VectorChord as a new database type in the DB enum. - Added VCHORDRQ as a new index type in the IndexType enum.

feat: add VectorChord client implementation and configuration 🎉

2627540

- Introduced VectorChord client with support for embedding operations. - Added configuration classes for VectorChord settings and parameters.

feat: add VectorChordRQ command to CLI 🎉

2bb2200

Integrate VectorChordRQ into the CLI for enhanced functionality. This addition allows users to utilize VectorChord in their benchmarks seamlessly. 🚀

Merge branch 'main' into feat/add-vectorchord-benchmark

8471644

feat: add VectorChord support to README 📖✨

bcf95dc

Updated installation instructions and added command line usage for VectorChord (vchordrq) to enhance user experience. 🚀

feat: add VectorChordGraph support and configuration 🎉✨

27c5f54

- Introduced VectorChordGraph command to CLI for enhanced functionality. - Added quantization and reranking options to VectorChord configurations.

kakao-edgar-p added 4 commits April 9, 2026 11:55

feat: add max_scan_tuples parameter to VectorChordGraph 🎉✨

e82cde2

Enhanced VectorChordGraph with max_scan_tuples option for better control over tuple scanning. Updated configuration and CLI to support this new feature! 🚀

Merge branch 'main' into feat/add-vectorchord-benchmark

82b8476

fix: correct formatting of max_scan_tuples parameter in VectorChordGr…

ccb4f7c

…aphConfig ✨🔧 Updated the comment for max_scan_tuples to ensure clarity on its default value and range. This enhances code readability and maintainability. 🚀

XuanYang-cn approved these changes Apr 9, 2026

View reviewed changes

XuanYang-cn reviewed Apr 9, 2026

View reviewed changes

kakao-edgar-p added 4 commits April 10, 2026 13:00

fix: correct user_name initialization in VectorChordConfig 🔧✨

1416d62

Updated the user_name initialization to use SecretStr for improved security. This change ensures that sensitive information is handled properly. 🚀

feat: add vectorchord extension creation on connection 🎉✨

583a609

This update ensures that the vectorchord extension is created if it doesn't already exist when establishing a connection. This enhances the setup process for the VectorChord class. 🚀

fix: remove unnecessary whitespace in VectorChord connection setup ✨🔧

43bb33c

Cleaned up the code by removing extra whitespace for better readability. Ensured that the connection and cursor assertions remain clear and concise.



		class VectorChordConfig(DBConfig):
		user_name: SecretStr = "postgres"

Conversation

R3gardless commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Supported Features

CLI

References

Uh oh!

R3gardless commented Apr 2, 2026

Uh oh!

XuanYang-cn commented Apr 9, 2026

Uh oh!

R3gardless commented Apr 9, 2026

Uh oh!

sre-ci-robot commented Apr 9, 2026

Uh oh!

XuanYang-cn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

R3gardless Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

R3gardless Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

R3gardless commented Apr 2, 2026 •

edited

Loading

R3gardless Apr 10, 2026 •

edited

Loading

R3gardless Apr 10, 2026 •

edited

Loading