⚡️ Speed up function get_annual_indicator_names by 33%#14
Open
misrasaurabh1 wants to merge 4 commits intoquantiacs:mainfrom
Open
⚡️ Speed up function get_annual_indicator_names by 33%#14misrasaurabh1 wants to merge 4 commits intoquantiacs:mainfrom
get_annual_indicator_names by 33%#14misrasaurabh1 wants to merge 4 commits intoquantiacs:mainfrom
Conversation
The optimized code achieves a **32% speedup** by converting the membership testing from O(N*M) to O(1) lookups and using more efficient set operations. **Key optimizations:** 1. **Set conversion for O(1) lookups**: Converts `GLOBAL_ANNUAL_US_GAAPS` (a list) to a set once at the beginning. This changes each `fact in GLOBAL_ANNUAL_US_GAAPS` lookup from O(M) list scanning to O(1) hash table lookup, where M is the size of the GAAP list (~23 elements). 2. **Set subset operation**: Replaces the generator expression `all(fact in GLOBAL_ANNUAL_US_GAAPS for fact in facts)` with `set(facts).issubset(global_annual_us_gaaps_set)`. This leverages optimized C-level set operations instead of Python loops. 3. **Empty facts handling**: Explicitly handles the edge case where `facts` is empty to preserve the original logic (empty facts should be included, as `all()` on empty iterables returns `True`). **Why this works:** The original code performed ~69% of its time (2.02ms out of 2.94ms total) on the membership checking line. With 1,240 indicators tested and potentially multiple facts per indicator, the O(N*M) complexity of repeated list lookups became the bottleneck. Set operations are implemented in C and highly optimized for these exact use cases. **Test case performance:** The optimization shows consistent 15-45% improvements across all test scenarios, with the largest gains (35-45%) occurring in edge cases with empty facts or when the GAAPS list is cleared, suggesting the set conversion overhead is minimal compared to the lookup savings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 33% (0.33x) speedup for
get_annual_indicator_namesinqnt/data/secgov_fundamental.py⏱️ Runtime :
693 microseconds→522 microseconds(best of173runs)📝 Explanation and details
The optimized code achieves a 32% speedup by converting the membership testing from O(N*M) to O(1) lookups and using more efficient set operations.
Key optimizations:
Set conversion for O(1) lookups: Converts
GLOBAL_ANNUAL_US_GAAPS(a list) to a set once at the beginning. This changes eachfact in GLOBAL_ANNUAL_US_GAAPSlookup from O(M) list scanning to O(1) hash table lookup, where M is the size of the GAAP list (~23 elements).Set subset operation: Replaces the generator expression
all(fact in GLOBAL_ANNUAL_US_GAAPS for fact in facts)withset(facts).issubset(global_annual_us_gaaps_set). This leverages optimized C-level set operations instead of Python loops.Empty facts handling: Explicitly handles the edge case where
factsis empty to preserve the original logic (empty facts should be included, asall()on empty iterables returnsTrue).Why this works:
The original code performed ~69% of its time (2.02ms out of 2.94ms total) on the membership checking line. With 1,240 indicators tested and potentially multiple facts per indicator, the O(N*M) complexity of repeated list lookups became the bottleneck. Set operations are implemented in C and highly optimized for these exact use cases.
Test case performance:
The optimization shows consistent 15-45% improvements across all test scenarios, with the largest gains (35-45%) occurring in edge cases with empty facts or when the GAAPS list is cleared, suggesting the set conversion overhead is minimal compared to the lookup savings.
✅ Correctness verification report:
⚙️ Existing Unit Tests and Runtime
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-get_annual_indicator_names-mgk4g4nland push.