refactor!: add NamespaceClientTableContext for cached namespace info by jackye1995 · Pull Request #6523 · lance-format/lance

jackye1995 · 2026-04-15T07:20:54Z

This is a second part of the namespace SDK refactoring.

When user uses namespace client to get table information via namespace.describe_table or namesapce.declare_table, it is very important that the information can be reused, so that you don't have to call the API repeatedly to gain the same information. Instead, you can get info in driver, distribute it, and use it directly in worker.

There are 3 information today falls in this category:

table location
catalog-provided storage options
managed-versioning boolean

Today, this is implemented by passing the known location, managed_versioning boolean, and merge storage_options with user provided ones. And this creates 2 problems:

code repetition: we have to pass these information repeatedly for all information, today it is done for all SDKs and for all engine integrations. This is prune to error for engineers and code agents that don't have much context on this feature.
missing intent: we don't know if the caller has really passed in cached info, or if it is just passed in by mistake. e.g. if I have passed in location, but not managed verisoning, do we call the API again to reconfirm?

This PR does the following:

Introduce NamespaceClientTableContext (Rust struct, Python class, Java class) that holds cached describe_table/declare_table response data (location, storage_options, managed_versioning)
Context is passed all the way to Rust layer where all decisions about storage options merging and managed versioning are made — Python and Java make no decisions

We also remove deprecated Dataset.create/open overloads and createWithFfiSchema JNI path in Java along the way.

…s Rust, Python, Java Introduce NamespaceClientTableContext struct/class that holds cached describe_table/declare_table response data (location, storage_options, managed_versioning). This is passed all the way to Rust where all decisions about storage options merging and managed versioning are made. - Rust: NamespaceClientTableContext in lance-namespace crate, with from_describe_table_response/from_declare_table_response constructors. DatasetBuilder::from_namespace_context and Dataset::write_into_namespace_context accept the context. - Python: NamespaceClientTableContext class in namespace module. All APIs (dataset, write_dataset, fragments, file reader/writer/session, TF) accept namespace_client_table_context parameter. PyO3 binding extracts context fields in Rust. - Java: NamespaceClientTableContext class with static factory methods. All builders (OpenDatasetBuilder, WriteDatasetBuilder, CommitBuilder, WriteFragmentBuilder) accept the context. JNI binding extracts context fields in Rust. Also removes deprecated Dataset.create/open overloads and createWithFfiSchema JNI path in Java. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…_namespace - Rename all `ctx` variables to `namespace_client_table_context` across Rust, Python bindings, Java JNI, and Java API - Merge from_namespace_context into from_namespace with optional context parameter - Merge write_into_namespace_context into write_into_namespace with optional context parameter Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Python and Java no longer call describe_table/declare_table. They pass namespace_client + table_id + optional context to Rust, which resolves via from_namespace/write_into_namespace internally. - Python binding: _Dataset::new uses DatasetBuilder::from_namespace when namespace_client is provided. _write_dataset uses Dataset::write_into_namespace. URI is now optional. - Python API: dataset() and write_dataset() no longer resolve location - Java API: OpenDatasetBuilder and WriteDatasetBuilder no longer call describeTable/declareTable, pass null path to Rust - Java JNI: still needs update to use from_namespace/write_into_namespace (next commit) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

BlockingDataset::open uses DatasetBuilder::from_namespace when namespace_client is provided. create_dataset uses Dataset::write_into_namespace. Storage options provider and commit handler setup is now fully handled by the Rust core functions. Removes manual LanceNamespaceStorageOptionsProvider and ExternalManifestCommitHandler setup from JNI layer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cargo fix removed LanceNamespaceStorageOptionsProvider and StorageOptionsAccessor imports that are no longer needed since from_namespace/write_into_namespace handle them internally. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… ruff format - Add None context parameter to from_namespace/write_into_namespace callers in lance-namespace-impls and lance-namespace-datafusion - Re-add deprecated Dataset.create/open Java convenience methods that delegate to new paths (tests depend on them) - Re-add createWithFfiSchema JNI for schema-only deprecated path - Fix ruff formatting in Python - Fix Java checkstyle line length Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove Preconditions.checkNotNull(path) in Dataset.create since path can be null when namespace is used - Remove unused DescribeTableRequest import from Python __init__.py - Move LanceNamespace/NamespaceClientTableContext to TYPE_CHECKING block - Fix ruff import sorting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Path can be null when namespace_client resolves location. Remove checkNotNull(path) from Dataset.open and handle null JString in inner_open_native. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-15T10:02:30Z

Codecov Report

❌ Patch coverage is 84.55882% with 21 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/dataset.rs	76.92%	12 Missing and 3 partials ⚠️
rust/lance/src/dataset/builder.rs	84.00%	2 Missing and 2 partials ⚠️
rust/lance-namespace/src/context.rs	90.00%	0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

…r, fix schema-only params - Always create StorageOptionsProvider when namespace is provided, not only when storage_options are present (fixes credential refresh test) - Set commit_handler on DatasetBuilder for APPEND/OVERWRITE path so the opened dataset has the correct handler (fixes managed versioning for non-CREATE modes) - Pass user's WriteParams to schema-only create path (fixes data storage version not being forwarded) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Schema-only empty dataset creation is only available through the deprecated Dataset.create(allocator, path, schema, params) method. The WriteDatasetBuilder now requires reader() or stream() as data source. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@deprecated

These are valid API methods, not deprecated. Remove @deprecated annotations and fix testGetLanceFileFormatVersion to use Dataset.create directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

with_read_params overwrites builder.options including the storage_options_accessor set by from_namespace. For namespace path, apply read params individually to avoid overwriting the accessor. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…open - PyLanceNamespace was missing declare_table delegation, causing "not implemented" error when Rust calls it on custom Python namespaces - JNI BlockingDataset::open applies read params individually to avoid overwriting storage_options_accessor set by from_namespace Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

westonpace

PR Review (from Claude Code)

This PR introduces NamespaceClientTableContext — a lightweight struct that caches namespace metadata (location, storage_options, managed_versioning) to avoid redundant describe_table()/declare_table() calls. It threads this context through Rust, Python, and Java layers.

Overall: The core design is sound — extracting a cacheable struct from namespace responses and threading it through is a clean approach. The refactoring significantly reduces duplication in write_into_namespace() (the old code duplicated the entire credential setup between Create and Append/Overwrite branches). There are a few issues worth addressing before merge.

P0: Bugs

1. namespace_client_table_context not forwarded to Rust in write_fragments()
python/python/lance/fragment.py:1196-1211 — The namespace_client_table_context parameter is accepted (line 1060) but never passed to _write_fragments() / _write_fragments_transaction(). This means the caching optimization is silently broken for the write_fragments API.

2. Wrong sentinel type in LanceDataset.__init__
python/python/lance/dataset.py:670 — self._namespace_client_table_context = False should be None. The type hint is Optional[NamespaceClientTableContext] but the sentinel is a boolean. This False will propagate through __copy__ and could cause subtle type-confusion bugs downstream (e.g., if context: vs if context is not None:).

P1: Logic Issues

3. Java builder validation has a gap
java/src/main/java/org/lance/OpenDatasetBuilder.java:171:

boolean hasNamespaceClient = namespaceClient != null || tableId != null;

This is OR not AND, so hasNamespaceClient is true if either is set. Then at line 177:

if (hasContext && !hasNamespaceClient)

This means providing a context with neither namespaceClient nor tableId is rejected, but providing context with only one of them passes validation. The subsequent null checks at lines 185-194 catch this at runtime, but the error path is confusing — you get "tableId is set but namespaceClient is missing" instead of a message about context requiring both. Same issue in WriteDatasetBuilder.

4. with_read_params workaround is fragile
python/src/dataset.rs (commit c3d409b) — The fix manually applies individual ReadParams fields to avoid with_read_params clobbering the namespace accessor. But this means any future ReadParams fields added to with_read_params won't be applied on the namespace path unless someone remembers to update this block too. Consider instead fixing with_read_params in the Rust builder to preserve the existing accessor when the incoming ReadParams doesn't set one (i.e., merge rather than overwrite), which would eliminate this divergence.

P2: Minor

5. NamespaceClientTableContext constructor allows empty location
java/src/main/java/org/lance/NamespaceClientTableContext.java:60 — The constructor checks location != null but not !location.isEmpty(), while the factory methods (fromDescribeTableResponse, fromDeclareTableResponse) do check for empty. An empty location string would silently pass the constructor and fail later with a confusing error.

6. Always-create provider when no storage options are vended
rust/lance/src/dataset/builder.rs:135-142 — The new code always creates a LanceNamespaceStorageOptionsProvider and StorageOptionsAccessor even when the namespace doesn't vend credentials (storage_options is None). The old code only did this when credentials were present. This is a minor behavioral change — it means every open will call describe_table() on credential refresh even for namespaces that don't vend credentials. Probably fine functionally but worth confirming intent.

7. Missing validation in fragment.py
python/python/lance/fragment.py:1166-1173 — No validation that namespace_client_table_context requires namespace_client and table_id to also be set. Compare to dataset.py:197-201 which does validate this.

Nits

The owned_context / shadowed namespace_client_table_context pattern in from_namespace() and write_into_namespace() is a clean way to handle the owned-vs-borrowed duality, nice.
Commit c3d409b is a fix-on-fix — consider squashing into the parent commit before merge for a cleaner history.

🤖 Generated with Claude Code

github-actions bot added enhancement New feature or request python java labels Apr 15, 2026

jackye1995 changed the title ~~feat: add NamespaceClientTableContext for cached namespace info~~ refactor!: add NamespaceClientTableContext for cached namespace info Apr 15, 2026

github-actions bot added the breaking-change label Apr 15, 2026

jackye1995 and others added 7 commits April 15, 2026 00:41

fix: handle null path in Dataset.open and JNI openNative

c183788

Path can be null when namespace_client resolves location. Remove checkNotNull(path) from Dataset.open and handle null JString in inner_open_native. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jackye1995 and others added 2 commits April 15, 2026 08:16

jackye1995 force-pushed the ns-cached-info branch from 3c374c5 to f3c6676 Compare April 15, 2026 15:29

jackye1995 and others added 3 commits April 15, 2026 08:32

fix: un-deprecate Dataset.create/open methods, fix legacy version test

0208c90

These are valid API methods, not deprecated. Remove @deprecated annotations and fix testGetLanceFileFormatVersion to use Dataset.create directly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

westonpace reviewed Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor!: add NamespaceClientTableContext for cached namespace info#6523

refactor!: add NamespaceClientTableContext for cached namespace info#6523
jackye1995 wants to merge 13 commits intolance-format:mainfrom
jackye1995:ns-cached-info

jackye1995 commented Apr 15, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

westonpace left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jackye1995 commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

westonpace left a comment

Choose a reason for hiding this comment

PR Review (from Claude Code)

P0: Bugs

P1: Logic Issues

P2: Minor

Nits

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jackye1995 commented Apr 15, 2026 •

edited

Loading

codecov bot commented Apr 15, 2026 •

edited

Loading