Skip to content

Tracking: Vector search (VectorType) performance improvement PRs #746

@mykaul

Description

@mykaul

Overview

There are multiple open PRs that collectively aim to significantly improve VectorType (custom type) deserialization performance for vector search workloads. This issue tracks them, describes their contributions, and proposes a merge priority/order.

Vector search workloads (e.g., 768/1536-dimension float vectors for embeddings) are a key use case where deserialization overhead is substantial. These PRs attack the problem from multiple angles: pure Python fast-paths, Cython C-level operations, numpy integration, type-resolution caching, and general read-path optimization.


Directly Vector-Related PRs

1. #689 — (Improvement) Improve performance of Vector type parsing

2. #730 — Optimize VectorType deserialization with struct.unpack and numpy

  • What: Pure Python path optimization. Replaces element-by-element deserialization with bulk struct.unpack for known numeric types, and adds a numpy.frombuffer().tolist() fast-path for vectors >= 32 elements.
  • Impact: 3.5x speedup for small vectors (Vector<float, 3>), 4.0x for large vectors (Vector<float, 1536>).
  • Scope: cassandra/cqltypes.py only. No Cython dependency.
  • Priority: HIGH — This is the foundational optimization that benefits all users (pure Python path is always available).

3. #731 — Add VectorType support to numpy_parser for 2D array parsing

  • What: Extends the Cython NumPy row parser to produce 2D masked arrays (num_rows, vector_dimension) for vector columns, using memcpy of raw wire bytes directly into pre-allocated numpy buffers.
  • Impact: Zero-copy path for ML/AI workloads consuming results as numpy arrays. Fastest possible path when the consumer is numpy.
  • Scope: cassandra/numpy_parser.pyx + new tests.
  • Priority: MEDIUM — Benefits users who opt into the numpy result path. Independent of other PRs.

4. #732 — Optimize Cython deserialization primitives and add VectorType Cython deserializer

  • What: 6 commits: (1) ntohs/ntohl intrinsics for Cython byte unpacking, (2) float-specific ntohl byte-swap, (3) from_ptr_and_size() refactor, (4) new DesVectorType Cython class with type-specialized C-level deserialization, (5) Windows portability, (6) buffer bounds validation.
  • Impact: 4.4–4.7x faster than pure Python for small vectors. For large vectors, both paths use numpy so the per-vector gain is marginal, but per-row dispatch overhead is eliminated. Also ~4-5% general row throughput improvement from byte-swap intrinsics.
  • Scope: cassandra/deserializers.pyx, cassandra/cython_marshal.pyx, cassandra/ioutils.pyx, cassandra/buffer.pxd.
  • Priority: HIGH — This is the Cython counterpart to (improvement) Optimize VectorType deserialization with struct.unpack and numpy #730 and also improves general Cython deserialization performance.

5. #733 — Add VectorType deserialization benchmarks and expand test coverage

  • What: Benchmark harness testing 4 strategies (VectorType.deserialize, struct.unpack, numpy.frombuffer, Cython DesVectorType) across multiple vector sizes/types. Enables vector integration tests on Scylla 2025.4+. Adds unit tests for Cython fallback and numpy large-vector paths.
  • Impact: No production code changes. Provides the measurement infrastructure to validate all the other PRs.
  • Priority: HIGH — Should be merged early or alongside the optimization PRs to provide regression tracking.

Supporting/Infrastructure PRs (Benefit Vector Workloads Indirectly)

These PRs optimize the general deserialization pipeline that vector data flows through. They benefit all types but have particular impact on vector workloads due to the high volume of data.

6. #690 — Optimize custom type parsing with LRU caching

  • What: Caches lookup_casstype() results to avoid repeated string manipulation and regex scanning. VectorType is a custom/parameterized type, so this directly reduces per-query type resolution overhead.
  • Priority: MEDIUM — Prerequisite-like optimization. Benefits vector type resolution specifically.

7. #729 — Fast-path lookup_casstype() for simple type names

  • What: Skips the regex scanner and stack-based parser for non-parameterized types (direct dict lookup). While VectorType itself is parameterized, its subtypes (FloatType, etc.) are simple and benefit from this.
  • Priority: LOW-MEDIUM — Incremental improvement to type resolution.

8. #734 — Remove copies on the read path

  • What: Reduces memory copies in the read path. Up to 5.3x speedup for large payloads. Vector results with 768/1536-dim float columns are large payloads.
  • Impact: 1.2x–5.3x depending on payload size.
  • Priority: HIGH — Significant general improvement that directly benefits vector query result processing.

9. #741 — Cache deserializer instances in find_deserializer (Cython only)

10. #742 — Cache ParseDesc for prepared statements (Cython only)

  • What: Caches the ParseDesc object so repeated prepared statement executions skip list comprehensions, ColDesc construction, and make_deserializers().
  • Impact: 4.7x speedup for single-row prepared results (ParseDesc construction: 5,730ns → 175ns for 10 cols).
  • Priority: MEDIUM — Most vector search uses prepared statements, so this directly helps.

11. #743 — Direct PyUnicode_DecodeUTF8/ASCII from C buffer (Cython only)

  • What: Eliminates intermediate bytes object allocation for UTF8/ASCII deserialization using direct CPython C API calls.
  • Impact: 1.2x–2.1x for text columns. While not vector-specific, this improves the same Cython deserialization layer.
  • Priority: LOW — Tangential to vector performance; improves the same code layer.

Proposed Merge Order

The ordering considers dependencies, risk, and impact:

Order PR Rationale
1 #733 (benchmarks/tests) Merge first to establish measurement baseline. No production code changes, low risk.
2 #730 (struct.unpack + numpy, pure Python) Foundational vector optimization. Benefits all users. No Cython dependency.
3 #734 (remove copies on read path) General read-path improvement. Benefits vector workloads significantly. Independent.
4 #732 (Cython VectorType deserializer) Cython vector optimization. May depend on #730 for shared numpy threshold logic.
5 #731 (numpy_parser 2D arrays) Independent, targets numpy consumers specifically. Can go anywhere after #730.
6 #690 (custom type LRU cache) Type resolution caching. Independent.
7 #729 (fast-path lookup_casstype) Incremental type resolution improvement. Independent.
8 #741 (cache deserializer instances) Benefits from #732 being merged first (caches DesVectorType).
9 #742 (cache ParseDesc) Builds on same Cython layer as #741.
10 #743 (direct UTF8 decode) Same Cython layer, least vector-specific.
11 #689 (original umbrella PR) Likely close if #730#733 cover all its content. Needs author confirmation.

Notes

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions