Cherry-pick upstream pathological-parse fixes onto v0.62.0 pin (FB-468)#1
Closed
moshap-firebolt wants to merge 6 commits into
Closed
Cherry-pick upstream pathological-parse fixes onto v0.62.0 pin (FB-468)#1moshap-firebolt wants to merge 6 commits into
moshap-firebolt wants to merge 6 commits into
Conversation
…/bench files apache#2350 and apache#2352 both added regression tests and benches in the same file regions. When cherry-picking both onto v0.62.0, the test bodies and the bench bodies for parse_compound_keyword_chain (apache#2350) and parse_prefix_keyword_call_chain (apache#2352) interleaved. Fix by separating them into independent function bodies and adding the missing group.bench_function / group.finish / closing braces for the apache#2350 function body that was truncated by the conflict.
Author
|
Closing — see firebolt-analytics/packdb#23192 for context. Fork's |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Background
Our packdb Cargo.toml currently pins this fork at
ec13296— exactly v0.62.0 release plus a single upstream cherry-pick of apache/datafusion-sqlparser-rs#2344 ("fix exponential parse time on compound chains"). The packdb dialects fuzzer hit a 158-byte input that takes 541 seconds to parse on the current pin:Upstream PR apache/datafusion-sqlparser-rs#2352 is the fix. Sibling PR #2350 covers a related but independent 2^N path in
parse_compound_expr.Summary
This branch is
v0.62.0+ four upstream commits cherry-picked from the two open upstream PRs, in order:Plus one local conflict-resolution commit (apache#2350 and apache#2352 both added regression tests / benches in the same file regions, which interleaved on cherry-pick):
The AST/JSON shape stays identical to v0.62.0 (no apache#2246 ORDER BY reshape, no other breaking changes), so the packdb-side translator continues to consume it without modification.
Test Plan
parse_compound_chain_no_exponential_blowup(existing, from Parser: fix exponential parse time on compound chains apache/datafusion-sqlparser-rs#2344)parse_compound_keyword_chain_no_exponential_blowup(Parser: fix exponential parse time on compound keyword chains apache/datafusion-sqlparser-rs#2350)parse_prefix_keyword_call_chain_no_exponential_blowup(Parser: fix exponential parse time on speculative prefix parsing apache/datafusion-sqlparser-rs#2352)parse_prefix_case_chain_no_exponential_blowup(Parser: fix exponential parse time on speculative prefix parsing apache/datafusion-sqlparser-rs#2352)dialects/smoke suite still passes both default mode (15/15) and--oracle(3 OK + 12 SKIPPED), confirming no AST shape drift.When upstream merges apache#2350 and apache#2352 and cuts 0.63, the packdb-side pin can revert to crates.io.
🤖 Generated with Claude Code
Note
Medium Risk
Touches core expression parsing (
parse_prefix, compound dot access) with behavior-sensitive error messages for some dialects; intended to be semantics-preserving aside from memoized failure paths.Overview
Fixes exponential parse time on pathological compound and prefix chains by memoizing failed
parse_prefixattempts and tightening how dotted field segments are parsed.Parser (
mod.rs): Adds position-keyedfailed_prefix_positionsandfailed_reserved_word_prefix_positionscaches (lightweightExprPrefixErrormarkers), cleared when the token stream resets.parse_prefixwraps the real work inparse_prefix_innerand short-circuits repeat failures; failed speculative reserved-word parses (e.g.CASE,current_time) are cached so chains likecase-case-...do not re-descend. In compound field access after., parsing usesparse_prefixinstead ofparse_subexpr, and plainWordfields not followed by(skip speculative prefix parsing to avoid.not-bkeyword chains blowing up.Tests & benches: Adds timeout-based regression tests for four hang patterns (compound
#chains,.not-bchains, nestedcurrent_time(calls,casechains) and Criterion benchmarks for the same shapes across dialects. Adjuststest_reserved_keywords_for_identifiersso dialects with expression-named function args expect the memoized error token (intervalvs)).Note: The bench
criterion_group!in the diff listsparse_compound_keyword_chainandparse_prefix_keyword_call_chainwithout a comma between them, which would fail to compile unless fixed elsewhere.Reviewed by Cursor Bugbot for commit c8d0a19. Bugbot is set up for automated code reviews on this repo. Configure here.