Tracking: Vector search (VectorType) performance improvement PRs

## Overview

There are multiple open PRs that collectively aim to significantly improve VectorType (custom type) deserialization performance for vector search workloads. This issue tracks them, describes their contributions, and proposes a merge priority/order.

Vector search workloads (e.g., 768/1536-dimension float vectors for embeddings) are a key use case where deserialization overhead is substantial. These PRs attack the problem from multiple angles: pure Python fast-paths, Cython C-level operations, numpy integration, type-resolution caching, and general read-path optimization.

---

## Directly Vector-Related PRs

### 1. #689 — (Improvement) Improve performance of Vector type parsing
- **What:** Original umbrella PR containing multiple commits across Python, Cython, and numpy layers for vector deserialization optimization.
- **Status:** Open since Feb 5, 2026. This appears to be the original monolithic PR that was later broken out into #730–#733.
- **Note:** May be superseded by the individual PRs below. Needs clarification on whether this should be closed in favor of #730–#733.

### 2. #730 — Optimize VectorType deserialization with struct.unpack and numpy
- **What:** Pure Python path optimization. Replaces element-by-element deserialization with bulk `struct.unpack` for known numeric types, and adds a `numpy.frombuffer().tolist()` fast-path for vectors >= 32 elements.
- **Impact:** 3.5x speedup for small vectors (`Vector<float, 3>`), **4.0x** for large vectors (`Vector<float, 1536>`).
- **Scope:** `cassandra/cqltypes.py` only. No Cython dependency.
- **Priority: HIGH** — This is the foundational optimization that benefits all users (pure Python path is always available).

### 3. #731 — Add VectorType support to numpy_parser for 2D array parsing
- **What:** Extends the Cython NumPy row parser to produce 2D masked arrays `(num_rows, vector_dimension)` for vector columns, using `memcpy` of raw wire bytes directly into pre-allocated numpy buffers.
- **Impact:** Zero-copy path for ML/AI workloads consuming results as numpy arrays. Fastest possible path when the consumer is numpy.
- **Scope:** `cassandra/numpy_parser.pyx` + new tests.
- **Priority: MEDIUM** — Benefits users who opt into the numpy result path. Independent of other PRs.

### 4. #732 — Optimize Cython deserialization primitives and add VectorType Cython deserializer
- **What:** 6 commits: (1) `ntohs/ntohl` intrinsics for Cython byte unpacking, (2) float-specific `ntohl` byte-swap, (3) `from_ptr_and_size()` refactor, (4) new `DesVectorType` Cython class with type-specialized C-level deserialization, (5) Windows portability, (6) buffer bounds validation.
- **Impact:** 4.4–4.7x faster than pure Python for small vectors. For large vectors, both paths use numpy so the per-vector gain is marginal, but per-row dispatch overhead is eliminated. Also ~4-5% general row throughput improvement from byte-swap intrinsics.
- **Scope:** `cassandra/deserializers.pyx`, `cassandra/cython_marshal.pyx`, `cassandra/ioutils.pyx`, `cassandra/buffer.pxd`.
- **Priority: HIGH** — This is the Cython counterpart to #730 and also improves general Cython deserialization performance.

### 5. #733 — Add VectorType deserialization benchmarks and expand test coverage
- **What:** Benchmark harness testing 4 strategies (VectorType.deserialize, struct.unpack, numpy.frombuffer, Cython DesVectorType) across multiple vector sizes/types. Enables vector integration tests on Scylla 2025.4+. Adds unit tests for Cython fallback and numpy large-vector paths.
- **Impact:** No production code changes. Provides the measurement infrastructure to validate all the other PRs.
- **Priority: HIGH** — Should be merged early or alongside the optimization PRs to provide regression tracking.

---

## Supporting/Infrastructure PRs (Benefit Vector Workloads Indirectly)

These PRs optimize the general deserialization pipeline that vector data flows through. They benefit all types but have particular impact on vector workloads due to the high volume of data.

### 6. #690 — Optimize custom type parsing with LRU caching
- **What:** Caches `lookup_casstype()` results to avoid repeated string manipulation and regex scanning. VectorType is a custom/parameterized type, so this directly reduces per-query type resolution overhead.
- **Priority: MEDIUM** — Prerequisite-like optimization. Benefits vector type resolution specifically.

### 7. #729 — Fast-path lookup_casstype() for simple type names
- **What:** Skips the regex scanner and stack-based parser for non-parameterized types (direct dict lookup). While VectorType itself is parameterized, its subtypes (FloatType, etc.) are simple and benefit from this.
- **Priority: LOW-MEDIUM** — Incremental improvement to type resolution.

### 8. #734 — Remove copies on the read path
- **What:** Reduces memory copies in the read path. Up to 5.3x speedup for large payloads. Vector results with 768/1536-dim float columns are large payloads.
- **Impact:** 1.2x–5.3x depending on payload size.
- **Priority: HIGH** — Significant general improvement that directly benefits vector query result processing.

### 9. #741 — Cache deserializer instances in find_deserializer (Cython only)
- **What:** Caches `find_deserializer()` and `make_deserializers()` results, avoiding repeated class lookups and Deserializer object creation. The `DesVectorType` from #732 would be cached here.
- **Impact:** 6x speedup for `find_deserializer`, 29x for `make_deserializers(10 types)`.
- **Priority: MEDIUM** — Depends on #732 (or at least benefits from it). Important for prepared statement hot paths.

### 10. #742 — Cache ParseDesc for prepared statements (Cython only)
- **What:** Caches the ParseDesc object so repeated prepared statement executions skip list comprehensions, ColDesc construction, and `make_deserializers()`. 
- **Impact:** 4.7x speedup for single-row prepared results (ParseDesc construction: 5,730ns → 175ns for 10 cols).
- **Priority: MEDIUM** — Most vector search uses prepared statements, so this directly helps.

### 11. #743 — Direct PyUnicode_DecodeUTF8/ASCII from C buffer (Cython only)
- **What:** Eliminates intermediate bytes object allocation for UTF8/ASCII deserialization using direct CPython C API calls.
- **Impact:** 1.2x–2.1x for text columns. While not vector-specific, this improves the same Cython deserialization layer.
- **Priority: LOW** — Tangential to vector performance; improves the same code layer.

---

## Proposed Merge Order

The ordering considers dependencies, risk, and impact:

| Order | PR | Rationale |
|-------|-----|-----------|
| 1 | **#733** (benchmarks/tests) | Merge first to establish measurement baseline. No production code changes, low risk. |
| 2 | **#730** (struct.unpack + numpy, pure Python) | Foundational vector optimization. Benefits all users. No Cython dependency. |
| 3 | **#734** (remove copies on read path) | General read-path improvement. Benefits vector workloads significantly. Independent. |
| 4 | **#732** (Cython VectorType deserializer) | Cython vector optimization. May depend on #730 for shared numpy threshold logic. |
| 5 | **#731** (numpy_parser 2D arrays) | Independent, targets numpy consumers specifically. Can go anywhere after #730. |
| 6 | **#690** (custom type LRU cache) | Type resolution caching. Independent. |
| 7 | **#729** (fast-path lookup_casstype) | Incremental type resolution improvement. Independent. |
| 8 | **#741** (cache deserializer instances) | Benefits from #732 being merged first (caches DesVectorType). |
| 9 | **#742** (cache ParseDesc) | Builds on same Cython layer as #741. |
| 10 | **#743** (direct UTF8 decode) | Same Cython layer, least vector-specific. |
| 11 | **#689** (original umbrella PR) | **Likely close** if #730–#733 cover all its content. Needs author confirmation. |

---

## Notes

- All PRs are authored by @mykaul.
- PRs #730–#733 appear to be a decomposition of #689 into focused, reviewable units.
- PRs #740 (namedtuple cache), #745 (metadata schema parsing), and #630 (column_encryption_policy opt) are also open performance PRs but are not vector-specific and are excluded from this tracking issue.
- The Cython-only PRs (#731, #732, #741, #742, #743) only benefit users with Cython compiled extensions. The pure Python PRs (#730, #690, #729) benefit all users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking: Vector search (VectorType) performance improvement PRs #746

Overview

Directly Vector-Related PRs

1. #689 — (Improvement) Improve performance of Vector type parsing

2. #730 — Optimize VectorType deserialization with struct.unpack and numpy

3. #731 — Add VectorType support to numpy_parser for 2D array parsing

4. #732 — Optimize Cython deserialization primitives and add VectorType Cython deserializer

5. #733 — Add VectorType deserialization benchmarks and expand test coverage

Supporting/Infrastructure PRs (Benefit Vector Workloads Indirectly)

6. #690 — Optimize custom type parsing with LRU caching

7. #729 — Fast-path lookup_casstype() for simple type names

8. #734 — Remove copies on the read path

9. #741 — Cache deserializer instances in find_deserializer (Cython only)

10. #742 — Cache ParseDesc for prepared statements (Cython only)

11. #743 — Direct PyUnicode_DecodeUTF8/ASCII from C buffer (Cython only)

Proposed Merge Order

Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Order	PR	Rationale
1	#733 (benchmarks/tests)	Merge first to establish measurement baseline. No production code changes, low risk.
2	#730 (struct.unpack + numpy, pure Python)	Foundational vector optimization. Benefits all users. No Cython dependency.
3	#734 (remove copies on read path)	General read-path improvement. Benefits vector workloads significantly. Independent.
4	#732 (Cython VectorType deserializer)	Cython vector optimization. May depend on #730 for shared numpy threshold logic.
5	#731 (numpy_parser 2D arrays)	Independent, targets numpy consumers specifically. Can go anywhere after #730.
6	#690 (custom type LRU cache)	Type resolution caching. Independent.
7	#729 (fast-path lookup_casstype)	Incremental type resolution improvement. Independent.
8	#741 (cache deserializer instances)	Benefits from #732 being merged first (caches DesVectorType).
9	#742 (cache ParseDesc)	Builds on same Cython layer as #741.
10	#743 (direct UTF8 decode)	Same Cython layer, least vector-specific.
11	#689 (original umbrella PR)	Likely close if #730–#733 cover all its content. Needs author confirmation.

Tracking: Vector search (VectorType) performance improvement PRs #746

Description

Overview

Directly Vector-Related PRs

1. #689 — (Improvement) Improve performance of Vector type parsing

2. #730 — Optimize VectorType deserialization with struct.unpack and numpy

3. #731 — Add VectorType support to numpy_parser for 2D array parsing

4. #732 — Optimize Cython deserialization primitives and add VectorType Cython deserializer

5. #733 — Add VectorType deserialization benchmarks and expand test coverage

Supporting/Infrastructure PRs (Benefit Vector Workloads Indirectly)

6. #690 — Optimize custom type parsing with LRU caching

7. #729 — Fast-path lookup_casstype() for simple type names

8. #734 — Remove copies on the read path

9. #741 — Cache deserializer instances in find_deserializer (Cython only)

10. #742 — Cache ParseDesc for prepared statements (Cython only)

11. #743 — Direct PyUnicode_DecodeUTF8/ASCII from C buffer (Cython only)

Proposed Merge Order

Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions