Reviewed src/emit-parser.ts (4223 lines) and src/emit-lexer.ts (687 lines) across four axes (matcher/perf, incremental·recovery·arena, lexer, engine-parity debt), excluding already-shipped optimizations (SoA token stream, columnar arena, multi-word bitmask dispatch, SECOND-token refinement, node surgery, run-adoption). Each item carries file:line + confidence, and distinguishes a real gap from a necessary tradeoff.
The review twice hit "looks like a gap, actually already fixed" (the 32-arm dispatch cliff is solved with multi-word masks emit-parser.ts:1197; the incremental "rebuild a whole slice on one keystroke" is already mitigated by node surgery emit-parser.ts:3228), so only code-backed real gaps are kept below.
Highest leverage (suggested priority)
- A2 — wire the engine-parity gates into CI (cheapest, removes the largest risk).
emit-parser-verify.ts and emit-reject-messages.ts are not in check.ts / package.json / .github/ (grep-confirmed zero matches); they run manually and depend on /tmp/ts-repo. Editing gen-parser.ts semantics has no mechanism — automated or human-enforced — that makes emit-parser.ts follow; it relies on a developer remembering to hand-sync both files and then manually running an un-wired gate against a corpus they must have cloned.
- A1 — static-analysis single-source.
analyze() (emit-parser.ts:57-621) is a near-verbatim rewrite of gen-parser.ts:39-759 — ~560 lines of duplicated pure functions, and it hides a genuinely divergent algorithm (see A3's isLeftRecursive). Extractable into src/grammar-analysis.ts shared by both engines (precedent already exists: grammar-utils.ts shares collectLiterals/isKeywordLiteral).
- C1 — arena monotonic growth.
edit() only appends, never reclaims; no free-list / compaction; only a full parse() resets the arena (which edit() never triggers). LSP-style long-lived sessions grow unbounded.
- B1 — DFA emitter dead path.
emitTokenScannerBody (token-dfa.ts:319) has zero external callers (confirmed), its "~1.3–1.6×" claim has no gate measurement, and the only measured DFA path (the interpreter) is net-negative on all 12 TS tokens.
A. Engine-parity debt (emit-parser ↔ gen-parser)
gen-parser.ts is the createParser runtime interpreter (correctness oracle); emit-parser.ts compiles the same grammar into a standalone parser whose CST must be byte-identical. The debt splits into two layers of different shape.
| # |
Title |
Location |
Type |
Confidence |
| A1 |
~560 lines of static analysis duplicated (binding-power, ledPrec, FIRST/SECOND, nullability, mixfix, access-tail…) — pure CstGrammar→data functions, no technical reason to write twice |
emit-parser.ts:57-621 ↔ gen-parser.ts:39-759 |
real debt (single-source) |
high |
| A2 |
the two parity gates are not automated: emit-parser-verify / emit-reject-messages are absent from check.ts/CI, run by hand, depend on /tmp/ts-repo; a gen-parser change has no mechanism propagating to emit-parser |
test/check.ts:17-52, .github/workflows/ci.yml:46 |
real debt (wire into CI) |
high |
| A3 |
isLeftRecursive is a different algorithm: gen-parser uses a left-corner transitive closure and rejects indirect/hidden LR at build time (gen-parser.ts:309-346); emit-parser does only the syntactic items[0]===self test (emit-parser.ts:121-127). They coincide on the current grammars, but A → opt(x) A … (recursion hidden behind a nullable prefix) would be routed through parseLeftRec by one engine and treated as an ordinary rule by the other → real CST divergence |
emit-parser.ts:121 ↔ gen-parser.ts:309-346 |
latent divergence (gate can't catch) |
high |
| A4 |
both-sides-identically-wrong risk: the gate proves emit≡interp, never interp≡correct. The largest surface is two hand-copied ~180-line SECOND-set fixpoints (each carrying a "MUST stay algorithm-identical" self-warning); a mis-derivation hand-copied into both produces the same wrong CST and the gate reports agree |
emit-parser.ts:342-520 ↔ gen-parser.ts:594-734 |
real risk |
medium |
Single-source feasibility (assessed):
- Layer A (static analysis) is eliminable, low-risk, high-value — they are pure functions of
CstGrammar; move them to a shared module. The only friction: emit's FIRST is the richer qualKeys variant, so unifying means lifting the interpreter to it too (a sound refinement, guarded by the existing conformance suite).
- Layer B (control loops) should NOT be unified — the interpreter runs object CSTs + a Map memo + zero incremental state; the emitter runs an SoA arena + a generation-stamped typed-array memo + adoption/recovery/EOF-relative spans. Dragging that into the interpreter destroys its value as a simple, independently-written oracle (an oracle sharing the suspect machinery can't catch bugs in it). The fix is verification, not unification: make the gate CI-blocking and keep the two hand-written loops, proving equivalence every CI run.
B. emit-lexer
| # |
Title |
Location |
Type |
Confidence |
| B1 |
DFA emitter is a wired-up-and-ready dead path: emitTokenScannerBody has zero external callers, emit-lexer always emits new RegExp; the "~1.3–1.6×" claim is unmeasured, and the only measured DFA path (interpreter) is net-negative on all 12 TS tokens (Ident 0.30×…) |
token-dfa.ts:319, emit-lexer.ts:104 |
real gap (deliver+measure or delete+drop the unsupported claim) |
high |
| B2 |
windowed re-lex is an emit-only orphan: the whole resync/findRestart/reconstructParens machinery has no counterpart in gen-lexer.ts, so there is no emit↔runtime differential oracle — only an emit-self-consistency guard |
emit-lexer.ts:251-674 |
real gap (missing independent oracle) |
medium |
| B3 |
resync retract+diag-truncate logic duplicated verbatim: the identical one-liner appears inside the loop and after it; the two can drift independently (extremely hard to catch by corpus — needs a resync exactly at the EOF boundary) |
emit-lexer.ts:362 and :613 |
real gap (de-dup) |
high |
| B4 |
non-ASCII whitespace regex fallback duplicated + per-position slow path: every non-whitespace cc>127 start char first fires an LX_WS.exec; the non-ASCII codepoints of \s are a fixed small set that can be baked into a charCode test |
emit-lexer.ts:372-376 and :379-383 |
real gap (bakeable) |
medium |
| B5 |
char-loop fast path and lexKwT re-scan the same bytes: the identifier hot path scans [pos+1,p) twice (once in the fast loop, once char-by-char in lexKwT); a length sentinel could short-circuit lexKwT when the run exceeds maxKw |
emit-lexer.ts:529-531 vs :179-181 |
real gap (needs a benchmark to confirm the fusion form) |
medium |
C. Incremental / recovery / arena
| # |
Title |
Location |
Type |
Confidence |
| C1 |
arena monotonic growth: edit() only appends (nodeN/kidN are reset only in full parse() at 3695-3696; editCore never resets), no free-list/compaction; self-noted at 3742-3743 "long edit sessions grow until then" |
emit-parser.ts:3695-3696 (reset), 3399-3429 (leak) |
real gap (design-known) |
high |
| C2 |
deletion-shaped splice could be in-place but takes end-allocation and leaks: the in-place branch only fires on f===removed (kid count unchanged); when kid count shrinks (n2k<nD, e.g. deleting a statement) it fits the original range yet takes the else end-allocation |
emit-parser.ts:3391 vs :3398 |
real gap (relax to n2k<=nD in-place) |
medium |
| C3 |
SURG_ELEM table is too narrow: building it requires a pure seq/group around exactly one */+ with no alt/opt/?/multiple reps; any list rule with an optional element or alternative (trailing comma, member list with modifiers, union body) gets no surgery, so edits inside it always take a full adoption re-parse |
emit-parser.ts:2062-2083 (build), 3311-3320 (use) |
real gap (relaxable: opts around the rep region survivable via the rowKC watermark argument) |
medium |
| C4 |
recovery second pass re-runs the entry rule from pos=0 every attempt + discards the adoptPath cache each attempt, up to 33 passes; adoption mitigates row rebuild but not the entry walk or cache rebuild |
emit-parser.ts:4069-4086, 4075/4095 |
real gap (keep cache across attempts + resume from the bar-free prefix) |
medium |
| C5 |
strict full-pass success path clears docPar and rebuilds docDiags with a full sort, instead of the surgery path's in-place shift (shiftDiags) |
emit-parser.ts:4046-4048 / 3110 |
minor gap |
medium |
| C6 |
surgery hit-rate in recovery state is markedly lower than in strict state: an edit near any recovery bar exits surgery (position-pure commutativity gate); "type-with-red-squiggles" sessions are affected |
emit-parser.ts:3272/3275/3342/3383 |
necessary tradeoff (note) |
high |
D. matcher / perf
| # |
Title |
Location |
Type |
Confidence |
| D1 |
inline alt has no predictive dispatch: rule-level longest-match has altMaskDispatch bitmask + SECOND refinement (emit-parser.ts:1190), but an alt inside a seq is still a naive shared-save try-each-arm; FIRST-disjoint inline alts could switch-dispatch, and 2-arm ones could "pick the arm from the first token, no save/restore" (grammar scale: typescript.ts 78 alt(, javascript.ts 43) |
emit-parser.ts:895 |
real gap |
medium |
Correction: I briefly suspected "packrat memo is a cold-parse burden" — retracted after reading parseRuleEntry (emit-parser.ts:2462-2552). The memo is already a per-rule typed-array (a lookup is two array loads, a store allocates nothing — not a Map), opened only for pratt/left-rec rules, and memoExt doubles as the infrastructure for incremental invalidation (intersected with an edit's damage window) and recovery cross-attempt reuse (barFreeWin). It is both cheap and selectively opened, and it is an incremental asset — not a gap.
Verified non-gaps (do not redo)
- 32-arm dispatch cliff: already solved with multi-word masks (
emit-parser.ts:1197, deliberately no arm-count ceiling).
- incremental "edit one item in a big list rebuilds a slice": already mitigated by node surgery (
emit-parser.ts:3228), full adoption only as the fallback.
- packrat memo: see the correction under D — typed-array, zero-alloc store, selectively opened, incremental/recovery infrastructure.
- most
trySurgery return -1 (F1/F2/F3/F5/F7: cross-kid descent stop, swallowing a non-element row, zero-width non-advance, >-splice detach, out-of-range pure insert, whole-file token count not closing) = necessary correctness gates.
collectErrRows is genuinely O(error paths) (walks only the rowRM spine, emit-parser.ts:3072).
- markup/indent/newline not emitted: intentional scope cut, explicit
return null falling back to createLexer (emit-lexer.ts:33-34), and emit-lexer-verify.ts errors if a grammar that should emit falls back.
- template open / first-char filter / punct chain / lexKwT perfect-hash: emit already bakes these, some better than the runtime.
Verification notes
- Personally grep-confirmed: the two parity gates are absent from
check.ts/package.json/.github/; the DFA emitter has zero external callers; arena nodeN=0/kidN=0 appear only in declarations and full parse().
- One of the four review axes (matcher/perf) hit a 503 server error mid-run; that axis was finished by hand reading
emit-parser.ts:895 (inline alt) and 2452-2560 (memo), so the D section is leaner — a full per-case perf sweep of matchInto/matchFn was not done. Re-run if an exhaustive pass is wanted.
Reviewed
src/emit-parser.ts(4223 lines) andsrc/emit-lexer.ts(687 lines) across four axes (matcher/perf, incremental·recovery·arena, lexer, engine-parity debt), excluding already-shipped optimizations (SoA token stream, columnar arena, multi-word bitmask dispatch, SECOND-token refinement, node surgery, run-adoption). Each item carriesfile:line+ confidence, and distinguishes a real gap from a necessary tradeoff.Highest leverage (suggested priority)
emit-parser-verify.tsandemit-reject-messages.tsare not incheck.ts/package.json/.github/(grep-confirmed zero matches); they run manually and depend on/tmp/ts-repo. Editinggen-parser.tssemantics has no mechanism — automated or human-enforced — that makesemit-parser.tsfollow; it relies on a developer remembering to hand-sync both files and then manually running an un-wired gate against a corpus they must have cloned.analyze()(emit-parser.ts:57-621) is a near-verbatim rewrite ofgen-parser.ts:39-759— ~560 lines of duplicated pure functions, and it hides a genuinely divergent algorithm (see A3'sisLeftRecursive). Extractable intosrc/grammar-analysis.tsshared by both engines (precedent already exists:grammar-utils.tssharescollectLiterals/isKeywordLiteral).edit()only appends, never reclaims; no free-list / compaction; only a fullparse()resets the arena (whichedit()never triggers). LSP-style long-lived sessions grow unbounded.emitTokenScannerBody(token-dfa.ts:319) has zero external callers (confirmed), its "~1.3–1.6×" claim has no gate measurement, and the only measured DFA path (the interpreter) is net-negative on all 12 TS tokens.A. Engine-parity debt (emit-parser ↔ gen-parser)
gen-parser.tsis thecreateParserruntime interpreter (correctness oracle);emit-parser.tscompiles the same grammar into a standalone parser whose CST must be byte-identical. The debt splits into two layers of different shape.CstGrammar→datafunctions, no technical reason to write twiceemit-parser.ts:57-621↔gen-parser.ts:39-759emit-parser-verify/emit-reject-messagesare absent fromcheck.ts/CI, run by hand, depend on/tmp/ts-repo; agen-parserchange has no mechanism propagating toemit-parsertest/check.ts:17-52,.github/workflows/ci.yml:46isLeftRecursiveis a different algorithm: gen-parser uses a left-corner transitive closure and rejects indirect/hidden LR at build time (gen-parser.ts:309-346); emit-parser does only the syntacticitems[0]===selftest (emit-parser.ts:121-127). They coincide on the current grammars, butA → opt(x) A …(recursion hidden behind a nullable prefix) would be routed throughparseLeftRecby one engine and treated as an ordinary rule by the other → real CST divergenceemit-parser.ts:121↔gen-parser.ts:309-346emit≡interp, neverinterp≡correct. The largest surface is two hand-copied ~180-line SECOND-set fixpoints (each carrying a "MUST stay algorithm-identical" self-warning); a mis-derivation hand-copied into both produces the same wrong CST and the gate reportsagreeemit-parser.ts:342-520↔gen-parser.ts:594-734Single-source feasibility (assessed):
CstGrammar; move them to a shared module. The only friction: emit's FIRST is the richer qualKeys variant, so unifying means lifting the interpreter to it too (a sound refinement, guarded by the existing conformance suite).B. emit-lexer
emitTokenScannerBodyhas zero external callers, emit-lexer always emitsnew RegExp; the "~1.3–1.6×" claim is unmeasured, and the only measured DFA path (interpreter) is net-negative on all 12 TS tokens (Ident 0.30×…)token-dfa.ts:319,emit-lexer.ts:104gen-lexer.ts, so there is no emit↔runtime differential oracle — only an emit-self-consistency guardemit-lexer.ts:251-674emit-lexer.ts:362and:613cc>127start char first fires anLX_WS.exec; the non-ASCII codepoints of\sare a fixed small set that can be baked into a charCode testemit-lexer.ts:372-376and:379-383lexKwTre-scan the same bytes: the identifier hot path scans[pos+1,p)twice (once in the fast loop, once char-by-char in lexKwT); a length sentinel could short-circuit lexKwT when the run exceeds maxKwemit-lexer.ts:529-531vs:179-181C. Incremental / recovery / arena
edit()only appends (nodeN/kidNare reset only in fullparse()at3695-3696;editCorenever resets), no free-list/compaction; self-noted at3742-3743"long edit sessions grow until then"emit-parser.ts:3695-3696(reset),3399-3429(leak)f===removed(kid count unchanged); when kid count shrinks (n2k<nD, e.g. deleting a statement) it fits the original range yet takes theelseend-allocationemit-parser.ts:3391vs:3398n2k<=nDin-place)SURG_ELEMtable is too narrow: building it requires a pure seq/group around exactly one*/+with noalt/opt/?/multiple reps; any list rule with an optional element or alternative (trailing comma, member list with modifiers, union body) gets no surgery, so edits inside it always take a full adoption re-parseemit-parser.ts:2062-2083(build),3311-3320(use)pos=0every attempt + discards theadoptPathcache each attempt, up to 33 passes; adoption mitigates row rebuild but not the entry walk or cache rebuildemit-parser.ts:4069-4086,4075/4095docParand rebuilds docDiags with a fullsort, instead of the surgery path's in-place shift (shiftDiags)emit-parser.ts:4046-4048/3110emit-parser.ts:3272/3275/3342/3383D. matcher / perf
altMaskDispatchbitmask + SECOND refinement (emit-parser.ts:1190), but analtinside a seq is still a naive shared-save try-each-arm; FIRST-disjoint inline alts could switch-dispatch, and 2-arm ones could "pick the arm from the first token, no save/restore" (grammar scale: typescript.ts 78alt(, javascript.ts 43)emit-parser.ts:895Verified non-gaps (do not redo)
emit-parser.ts:1197, deliberately no arm-count ceiling).emit-parser.ts:3228), full adoption only as the fallback.trySurgeryreturn -1(F1/F2/F3/F5/F7: cross-kid descent stop, swallowing a non-element row, zero-width non-advance,>-splice detach, out-of-range pure insert, whole-file token count not closing) = necessary correctness gates.collectErrRowsis genuinely O(error paths) (walks only the rowRM spine,emit-parser.ts:3072).return nullfalling back tocreateLexer(emit-lexer.ts:33-34), andemit-lexer-verify.tserrors if a grammar that should emit falls back.Verification notes
check.ts/package.json/.github/; the DFA emitter has zero external callers; arenanodeN=0/kidN=0appear only in declarations and fullparse().emit-parser.ts:895(inline alt) and2452-2560(memo), so the D section is leaner — a full per-case perf sweep ofmatchInto/matchFnwas not done. Re-run if an exhaustive pass is wanted.