tree-sitter generate is ~5min for the TS grammar (9788 states vs official ~2000)

The generated TypeScript tree-sitter parser has STATE_COUNT 9788 (parser.c ~16 MB); `tree-sitter generate` takes ~5 min single-threaded. The official tree-sitter-typescript is ~2000 states. This only bites at commit/CI time (when a grammar change regenerates `gate:treesitter`); the dev loop is unaffected (`npm run check` was parallelized to ~19s and does not run generate).

## Root cause (measured)
~116 `LR_CONFLICT_CLOSURE` tuples (`src/gen-treesitter.ts`), incl. **34 self-conflicts** (`['expr']`, `['type']`, `['type_member']`). The dominant driver is the genuine **structural ambiguity** of the unified grammar: a TS type `A.B<C>[]` is also a valid expression, so tree-sitter GLR explores both `type` and `expr` (and `binding`/`prop`/`member_name`) at every `{` / `[` / computed position.

### Ruled out (already optimal / not the cause)
- Operator precedence — `gen-treesitter` already emits `prec.left`/`prec.right`.
- Keyword lexer — already uses `word:` extraction.
- nested-new self-conflict — removing it dropped only ~250 states.

### Empirically-failed emission tweaks
- **inline never-captured rules** (`inline: [param, prop]` + strip inlined symbols from `conflicts`): tree-sitter errored *"Add a conflict for these rules: expr, member_name, binding_property"* (stripping surfaced an undeclared conflict) and took **48 min** to fail. `param` is referenced 82x, so inlining also duplicates its body.
- **type-Pratt** (wrap the `type` rule’s left-recursive arms in `prec.left`, since `Type` is not a pratt rule and renders prec-0): STATE_COUNT 9788 → **9868 (worse)**, and the `type`/`type_member` conflicts remain — the `['type']` self-conflict is true type-vs-expr ambiguity, not left-recursion associativity, so precedence cannot resolve it.

## The path that actually reaches ~2000 states
An **external scanner** that disambiguates type-vs-expr context (how the official grammar avoids these conflicts) — a major, multi-session refactor. The 5–48 min generate cycle (failures balloon to ~48 min) makes iteration expensive, so this needs a dedicated effort, not incremental tweaks.

## Lower-effort follow-ups worth trying first
- Prune the **append-only** `LR_CONFLICT_CLOSURE` (collect-conflicts.ts only adds, never removes) — a stale tuple still forces GLR splits.
- `supertypes:` for `member_name` (and possibly `type`/`expr`).
- Stagger the LED precedence ladder + a prec wrapper on the C-cast nud.
(Each is a 5-min generate to verify; `gate:treesitter` ≥95.9% is the only regression guard since these are emission-only.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tree-sitter generate is ~5min for the TS grammar (9788 states vs official ~2000) #46

Root cause (measured)

Ruled out (already optimal / not the cause)

Empirically-failed emission tweaks

The path that actually reaches ~2000 states

Lower-effort follow-ups worth trying first

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

tree-sitter generate is ~5min for the TS grammar (9788 states vs official ~2000) #46

Description

Root cause (measured)

Ruled out (already optimal / not the cause)

Empirically-failed emission tweaks

The path that actually reaches ~2000 states

Lower-effort follow-ups worth trying first

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions