fix: make properties from endpoint cache thread safe#840
fix: make properties from endpoint cache thread safe#840
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Testing This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@maxi297/fix_properties_from_endpoint_cache#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch maxi297/fix_properties_from_endpoint_cacheHelpful ResourcesPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
|
/autofix
|
📝 WalkthroughWalkthroughAdds thread-safety to the Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor T1 as Thread 1
actor T2 as Thread 2
participant C as PropertiesFromEndpoint
par Concurrent Access
T1->>C: Get cached properties (check 1)
T2->>C: Get cached properties (check 1)
and
Note over T1,T2: Both see cache uninitialized
end
par Lock Acquisition
T1->>C: Acquire lock
T2->>C: Wait for lock
and
T1->>C: Check cache again (check 2)
T1->>C: Populate cache
T1->>C: Release lock
end
T2->>C: Acquire lock
T2->>C: Check cache (already populated)
T2->>C: Return cached value
T2->>C: Release lock
Note over C: Double-checked locking<br/>prevents race conditions
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (1)
28-28: Good lock initialization, though Lock might suffice?The
RLock(reentrant lock) correctly enables thread safety. Since there's no recursive acquisition pattern inget_properties_from_endpoint, would a simplethreading.Lock()work just as well with slightly less overhead, or do you foresee a need for reentrancy? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)
airbyte_cdk/sources/declarative/interpolation/interpolated_string.py (1)
InterpolatedString(13-79)airbyte_cdk/sources/declarative/retrievers/retriever.py (1)
Retriever(14-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: Check: source-intercom
- GitHub Check: Check: source-pokeapi
- GitHub Check: Check: destination-motherduck
- GitHub Check: Check: source-shopify
- GitHub Check: Check: source-hardcoded-records
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.13, Ubuntu)
- GitHub Check: Pytest (All, Python 3.12, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Manifest Server Docker Image Build
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/requesters/query_properties/properties_from_endpoint.py (2)
2-10: LGTM! Clean import additions.The
threadingimport and import reorganization support the thread-safety changes well.
34-44: Code is thread-safe as implemented.Good news: your double-checked locking pattern already handles the concern. Here's why:
_get_propertyis thread-safe: it only reads fromselfand performs read-only operations on the parameter.read_recordsis protected: While Airbyte CDK'sread_recordsisn't guaranteed to be thread-safe, your lock ensures only one thread calls it during initialization. Subsequent calls simply return the cached result without re-acquiring the lock—no synchronization needed there.- Immutable initialization parameter:
records_schema={}andstream_slice=Noneare safe read-only inputs.The pattern correctly follows the "one instance per thread or explicit synchronization" guidance that the CDK recommends. Nice implementation!
PyTest Results (Full)3 820 tests 3 808 ✅ 10m 44s ⏱️ Results for commit b0001b8. |
What
Given that multiple threads for the same stream starts at the same time, we do load the properties from endpoint cache multiple times
Summary by CodeRabbit