Fix Weaviate tenant-aware ingestion#67298
Open
iwannagotobed wants to merge 1 commit into
Open
Conversation
Contributor
|
LGTM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes tenant-aware ingestion for Weaviate operators.
WeaviateIngestOperatorandWeaviateDocumentIngestOperatoralready accepted atenantargument, but the value was not consistently applied to the underlyinghook operations.
As a result, users could configure
tenant="..."on the operator while somecreate, query, replace, or delete operations still ran against the base
collection instead of the tenant-scoped collection.
Why this matters
Weaviate multi-tenancy isolates objects by tenant within a collection. For a
multi-tenant collection, object operations must be performed through:
If the Airflow operator accepts a tenant parameter but does not apply it to the
actual Weaviate collection operation, the provider does not honor the user's
multi-tenancy boundary.
This can lead to confusing and risky behavior:
tenanton the operator and expects data to be written into that tenant.tenant scope.
Reproduction
I reproduced the issue with a small Airflow UI Dag.
The collection is multi-tenant, the ingest operator receives
tenant="tenant-a", and the verification task reads from the tenant-scoped collection.Before this fix, the ingest task completed successfully, but the tenant-scoped verification task failed because the expected object was not found in
tenant-a.Changes
This change makes the configured tenant flow through the provider consistently:
tenantfromWeaviateIngestOperatortoWeaviateHook.batch_data().tenantfromWeaviateDocumentIngestOperatortoWeaviateHook.create_or_replace_document_objects().collection.with_tenant(tenant)insideWeaviateHook.batch_data()before batch insertion.tenantsupport todelete_object()and_delete_objects()so cleanup and rollback operations stay within the same tenant scope.Result
After the fix, the same Airflow UI reproduction Dag succeeds end to end:
create_collection: successingest_with_tenant: successverify_tenant_data: successcleanup_collection: successTests
I ran the relevant Weaviate provider tests with Breeze:
Result:
Was generative AI tooling used to co-author this PR?