The generated TypeScript tree-sitter parser has STATE_COUNT 9788 (parser.c ~16 MB); tree-sitter generate takes ~5 min single-threaded. The official tree-sitter-typescript is ~2000 states. This only bites at commit/CI time (when a grammar change regenerates gate:treesitter); the dev loop is unaffected (npm run check was parallelized to ~19s and does not run generate).
Root cause (measured)
~116 LR_CONFLICT_CLOSURE tuples (src/gen-treesitter.ts), incl. 34 self-conflicts (['expr'], ['type'], ['type_member']). The dominant driver is the genuine structural ambiguity of the unified grammar: a TS type A.B<C>[] is also a valid expression, so tree-sitter GLR explores both type and expr (and binding/prop/member_name) at every { / [ / computed position.
Ruled out (already optimal / not the cause)
- Operator precedence —
gen-treesitter already emits prec.left/prec.right.
- Keyword lexer — already uses
word: extraction.
- nested-new self-conflict — removing it dropped only ~250 states.
Empirically-failed emission tweaks
- inline never-captured rules (
inline: [param, prop] + strip inlined symbols from conflicts): tree-sitter errored "Add a conflict for these rules: expr, member_name, binding_property" (stripping surfaced an undeclared conflict) and took 48 min to fail. param is referenced 82x, so inlining also duplicates its body.
- type-Pratt (wrap the
type rule’s left-recursive arms in prec.left, since Type is not a pratt rule and renders prec-0): STATE_COUNT 9788 → 9868 (worse), and the type/type_member conflicts remain — the ['type'] self-conflict is true type-vs-expr ambiguity, not left-recursion associativity, so precedence cannot resolve it.
The path that actually reaches ~2000 states
An external scanner that disambiguates type-vs-expr context (how the official grammar avoids these conflicts) — a major, multi-session refactor. The 5–48 min generate cycle (failures balloon to ~48 min) makes iteration expensive, so this needs a dedicated effort, not incremental tweaks.
Lower-effort follow-ups worth trying first
- Prune the append-only
LR_CONFLICT_CLOSURE (collect-conflicts.ts only adds, never removes) — a stale tuple still forces GLR splits.
supertypes: for member_name (and possibly type/expr).
- Stagger the LED precedence ladder + a prec wrapper on the C-cast nud.
(Each is a 5-min generate to verify; gate:treesitter ≥95.9% is the only regression guard since these are emission-only.)
The generated TypeScript tree-sitter parser has STATE_COUNT 9788 (parser.c ~16 MB);
tree-sitter generatetakes ~5 min single-threaded. The official tree-sitter-typescript is ~2000 states. This only bites at commit/CI time (when a grammar change regeneratesgate:treesitter); the dev loop is unaffected (npm run checkwas parallelized to ~19s and does not run generate).Root cause (measured)
~116
LR_CONFLICT_CLOSUREtuples (src/gen-treesitter.ts), incl. 34 self-conflicts (['expr'],['type'],['type_member']). The dominant driver is the genuine structural ambiguity of the unified grammar: a TS typeA.B<C>[]is also a valid expression, so tree-sitter GLR explores bothtypeandexpr(andbinding/prop/member_name) at every{/[/ computed position.Ruled out (already optimal / not the cause)
gen-treesitteralready emitsprec.left/prec.right.word:extraction.Empirically-failed emission tweaks
inline: [param, prop]+ strip inlined symbols fromconflicts): tree-sitter errored "Add a conflict for these rules: expr, member_name, binding_property" (stripping surfaced an undeclared conflict) and took 48 min to fail.paramis referenced 82x, so inlining also duplicates its body.typerule’s left-recursive arms inprec.left, sinceTypeis not a pratt rule and renders prec-0): STATE_COUNT 9788 → 9868 (worse), and thetype/type_memberconflicts remain — the['type']self-conflict is true type-vs-expr ambiguity, not left-recursion associativity, so precedence cannot resolve it.The path that actually reaches ~2000 states
An external scanner that disambiguates type-vs-expr context (how the official grammar avoids these conflicts) — a major, multi-session refactor. The 5–48 min generate cycle (failures balloon to ~48 min) makes iteration expensive, so this needs a dedicated effort, not incremental tweaks.
Lower-effort follow-ups worth trying first
LR_CONFLICT_CLOSURE(collect-conflicts.ts only adds, never removes) — a stale tuple still forces GLR splits.supertypes:formember_name(and possiblytype/expr).(Each is a 5-min generate to verify;
gate:treesitter≥95.9% is the only regression guard since these are emission-only.)