Open
Conversation
The optimized code achieves a 15% speedup through two key optimizations:
**1. Efficient Fraction Detection**
The original code used `text.count("/") == 1` which scans the entire string to count occurrences. The optimized version first checks `if "/" in text:` (which stops at the first occurrence), then uses `split("/")` and checks `len(splits) == 2`. This avoids the full string scan in the common case where there's no slash at all, and is more efficient when there is exactly one slash.
**2. Set-Based Tibetan Word Lookup**
The original code performed membership testing against `_num_words` (a list), which is O(n) linear search. The optimized version converts this to a set on first use and caches it as a function attribute, making subsequent lookups O(1). This is particularly effective since the profiler shows 1,169 hits on the Tibetan word check.
**Performance Impact by Test Case:**
- **Invalid strings without "/"**: 34-72% faster (e.g., "hello", "abc123") because they skip the expensive count operation entirely
- **Valid Tibetan numerals**: 17-28% faster due to O(1) set lookup vs O(n) list search
- **Large invalid strings**: Up to 66% faster when "/" is absent
- **Basic digit strings**: Minimal impact (slight variations due to measurement noise)
The optimizations are most beneficial for workloads with many invalid strings or frequent Tibetan numeral checks, while maintaining identical correctness for all test cases.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 15% (0.15x) speedup for
like_numinspacy/lang/bo/lex_attrs.py⏱️ Runtime :
1.45 milliseconds→1.26 milliseconds(best of139runs)📝 Explanation and details
The optimized code achieves a 15% speedup through two key optimizations:
1. Efficient Fraction Detection
The original code used
text.count("/") == 1which scans the entire string to count occurrences. The optimized version first checksif "/" in text:(which stops at the first occurrence), then usessplit("/")and checkslen(splits) == 2. This avoids the full string scan in the common case where there's no slash at all, and is more efficient when there is exactly one slash.2. Set-Based Tibetan Word Lookup
The original code performed membership testing against
_num_words(a list), which is O(n) linear search. The optimized version converts this to a set on first use and caches it as a function attribute, making subsequent lookups O(1). This is particularly effective since the profiler shows 1,169 hits on the Tibetan word check.Performance Impact by Test Case:
The optimizations are most beneficial for workloads with many invalid strings or frequent Tibetan numeral checks, while maintaining identical correctness for all test cases.
✅ Correctness verification report:
🌀 Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-like_num-mhmj0jb5and push.