Context
PR #41 added three opt-in IndentConfig knobs (commentExcept, rawBlock, flowColonSeparator) so a non-YAML indentation language (NMBL — a Pug-shaped HTML shorthand) can opt out of behaviors the indent mode had baked in for YAML. The PR is correct and merged. Its "field notes", plus a follow-up audit of src/gen-lexer.ts, show the YAML coupling is broader than those three knobs: YAML semantics ride on generic token flags (string, blockPattern) and on hardcoded literals in the lexer, with no opt-out except mis-declaring the grammar.
This issue tracks decoupling the generic indent / newline / flow core from the YAML profile. The seam is already named in the code: NewlineConfig is a genuinely generic line+flow core, and types.ts literally comments "indent = newline + indent stack + YAML block-scalar semantics." The goal is to make that comment true in code — so any future indentation client composes the core instead of fighting YAML.
Root cause
A non-YAML indentation grammar inherits YAML behaviors it never asked for, because those behaviors are derived from flags/literals that mean something else. Verified sites (all src/gen-lexer.ts unless noted):
(A) Flow : key/value separator membership is derived from the string flag.
stringTokenNames = tokens.filter(t => t.string) (:263) feeds the flow-: carve-out (:987). So flagging a token string: true — whose legitimate jobs are string-region scoping and auto-close-delimiter derivation — silently enlists it into YAML's key: value separator emission. The adopter had to drop string: true to escape this, and lost auto-close delimiter derivation. flowColonSeparator: false (PR #41) only gates the carve-out wholesale; it does not let a token opt out individually, nor does it un-overload the flag.
(B) Plain-scalar continuation folding is derived from blockPattern.
plainScalarTokenNames / plainContinuationTokenName (:270, :277) are derived from any token with a blockPattern, and drive three fold sites (:715 block-context, :1224-1241 flow illegal-head, :1292-1327 flow multi-line merge). Any blockPattern token gets YAML plain-scalar folding, with no opt-out.
(C) keyValueSeparator is half-wired (parser ↔ highlighter split, latent).
keyValueSeparator is declared (types.ts:318) and honored by the highlighter generator in several places (gen-tm.ts:3314, 4928, 5188, 5393). The lexer never reads it and instead hardcodes : in seven key-separator decisions (:329, :386, :397, :538, :767, :985, :1149). These lexer sniffs (lineHasKeySeparator, startsBlockStructuralNode, the flow carve-out, compact-key pairing) make the same "is this a mapping-key line" decision the highlighter makes from keyValueSeparator — so the field name reads like a general knob but only half the pipeline obeys it. Latent today (the only indent grammar, yaml.ts:630, sets it to :, matching the hardcode), but a future indent grammar that sets a different separator gets a highlighter built around its separator and a parser still keyed on : — which violates the project's parser↔highlighter-consistency-by-construction guarantee. (compactIndicators is in the same family: read by the lexer at :359/:1155 but also hardcoded as -/? in the tab/value sniffs at :538/:767.)
(D) §6.1 tab-in-indentation errors are gated only on !!indent.
The YAML-spec tab errors throw at :680 and :765 via startsBlockStructuralNode (:383-400), which hardcodes YAML indicators -/?/:/&/!. Any indentation grammar inherits YAML tab rejection; it is not a knob.
(E) blockScalar and PR #41's rawBlock are two near-mirror loops for one concept — an "indentation-bounded verbatim region" (:827-905 vs the rawBlock loop). Shared behavior is the bare "capture following lines with col > parent as one token" skeleton; everything that decides body start/stop differs (see note under Non-goals).
Proposed direction
Priority order. PRIMARY items are the actual correctness defects and the real blocker for non-YAML adopters:
- Un-overload (A) and (B). Source the flow-
: separator membership and the plain-scalar fold from their own explicit config, not from the general-purpose string / blockPattern flags. This is the direct fix for the two field notes and the only change that lets a non-YAML grammar keep semantically-correct string / blockPattern flags. (Un-overload, not "bolt an opt-out onto an overloaded flag.")
- Route the lexer's key-separator sniffs through
keyValueSeparator (and ideally compactIndicators) so the lexer and gen-tm share one source of truth for the separator. Fixes (C); standalone latent bug worth fixing regardless.
- Flip the indent-core defaults to neutral (no
§6.1 tab errors, no flow-: carve-out, no plain-fold unless declared) and have yaml.ts opt into the YAML behaviors via explicit fields — it already declares blockScalar / compactIndicators / tagScope explicitly, so this is the same pattern. Proof obligation: yaml.ts tokenizes byte-identically (existing YAML gates + scope-gap:yaml). (D)'s tab-error split additionally needs startsBlockStructuralNode's indicator set parameterized, not just a boolean gate — larger sub-task.
Non-goals
- No named
profile: "yaml" preset. With a single in-repo indent client it is premature, and it bundles/hides the coupling behind a switch rather than removing it. Explicit per-field declaration in yaml.ts achieves byte-identity without the indirection.
- No public merge of
blockScalar and rawBlock. They share only the capture skeleton; their public knobs barely overlap (introducers/documentMarkers/indicatorScope/|N indicators/atRoot are lead-position + YAML-document-only; rawBlock's introChar/signature/glue are trailing-only). A unified type would make every field but token mode-exclusive dead weight and force adopters to read YAML vocabulary. At most extract the bare skeleton internally; keep two focused, mode-neutral-named fields.
Acceptance
yaml.ts stays byte-identical (YAML gates + scope-gap:yaml).
- A non-YAML indent grammar can set
string: true / a blockPattern without being enlisted into YAML separator/fold semantics.
- The lexer and
gen-tm agree on the key/value separator for any keyValueSeparator value.
- No
profile preset added.
Context
PR #41 added three opt-in
IndentConfigknobs (commentExcept,rawBlock,flowColonSeparator) so a non-YAML indentation language (NMBL — a Pug-shaped HTML shorthand) can opt out of behaviors the indent mode had baked in for YAML. The PR is correct and merged. Its "field notes", plus a follow-up audit ofsrc/gen-lexer.ts, show the YAML coupling is broader than those three knobs: YAML semantics ride on generic token flags (string,blockPattern) and on hardcoded literals in the lexer, with no opt-out except mis-declaring the grammar.This issue tracks decoupling the generic indent / newline / flow core from the YAML profile. The seam is already named in the code:
NewlineConfigis a genuinely generic line+flow core, andtypes.tsliterally comments "indent = newline + indent stack + YAML block-scalar semantics." The goal is to make that comment true in code — so any future indentation client composes the core instead of fighting YAML.Root cause
A non-YAML indentation grammar inherits YAML behaviors it never asked for, because those behaviors are derived from flags/literals that mean something else. Verified sites (all
src/gen-lexer.tsunless noted):(A) Flow
:key/value separator membership is derived from thestringflag.stringTokenNames = tokens.filter(t => t.string)(:263) feeds the flow-:carve-out (:987). So flagging a tokenstring: true— whose legitimate jobs are string-region scoping and auto-close-delimiter derivation — silently enlists it into YAML'skey: valueseparator emission. The adopter had to dropstring: trueto escape this, and lost auto-close delimiter derivation.flowColonSeparator: false(PR #41) only gates the carve-out wholesale; it does not let a token opt out individually, nor does it un-overload the flag.(B) Plain-scalar continuation folding is derived from
blockPattern.plainScalarTokenNames/plainContinuationTokenName(:270,:277) are derived from any token with ablockPattern, and drive three fold sites (:715block-context,:1224-1241flow illegal-head,:1292-1327flow multi-line merge). AnyblockPatterntoken gets YAML plain-scalar folding, with no opt-out.(C)
keyValueSeparatoris half-wired (parser ↔ highlighter split, latent).keyValueSeparatoris declared (types.ts:318) and honored by the highlighter generator in several places (gen-tm.ts:3314, 4928, 5188, 5393). The lexer never reads it and instead hardcodes:in seven key-separator decisions (:329,:386,:397,:538,:767,:985,:1149). These lexer sniffs (lineHasKeySeparator,startsBlockStructuralNode, the flow carve-out, compact-key pairing) make the same "is this a mapping-key line" decision the highlighter makes fromkeyValueSeparator— so the field name reads like a general knob but only half the pipeline obeys it. Latent today (the only indent grammar,yaml.ts:630, sets it to:, matching the hardcode), but a future indent grammar that sets a different separator gets a highlighter built around its separator and a parser still keyed on:— which violates the project's parser↔highlighter-consistency-by-construction guarantee. (compactIndicatorsis in the same family: read by the lexer at:359/:1155but also hardcoded as-/?in the tab/value sniffs at:538/:767.)(D)
§6.1tab-in-indentation errors are gated only on!!indent.The YAML-spec tab errors throw at
:680and:765viastartsBlockStructuralNode(:383-400), which hardcodes YAML indicators-/?/:/&/!. Any indentation grammar inherits YAML tab rejection; it is not a knob.(E)
blockScalarand PR #41'srawBlockare two near-mirror loops for one concept — an "indentation-bounded verbatim region" (:827-905vs therawBlockloop). Shared behavior is the bare "capture following lines with col > parent as one token" skeleton; everything that decides body start/stop differs (see note under Non-goals).Proposed direction
Priority order. PRIMARY items are the actual correctness defects and the real blocker for non-YAML adopters:
:separator membership and the plain-scalar fold from their own explicit config, not from the general-purposestring/blockPatternflags. This is the direct fix for the two field notes and the only change that lets a non-YAML grammar keep semantically-correctstring/blockPatternflags. (Un-overload, not "bolt an opt-out onto an overloaded flag.")keyValueSeparator(and ideallycompactIndicators) so the lexer andgen-tmshare one source of truth for the separator. Fixes (C); standalone latent bug worth fixing regardless.§6.1tab errors, no flow-:carve-out, no plain-fold unless declared) and haveyaml.tsopt into the YAML behaviors via explicit fields — it already declaresblockScalar/compactIndicators/tagScopeexplicitly, so this is the same pattern. Proof obligation:yaml.tstokenizes byte-identically (existing YAML gates +scope-gap:yaml). (D)'s tab-error split additionally needsstartsBlockStructuralNode's indicator set parameterized, not just a boolean gate — larger sub-task.Non-goals
profile: "yaml"preset. With a single in-repo indent client it is premature, and it bundles/hides the coupling behind a switch rather than removing it. Explicit per-field declaration inyaml.tsachieves byte-identity without the indirection.blockScalarandrawBlock. They share only the capture skeleton; their public knobs barely overlap (introducers/documentMarkers/indicatorScope/|Nindicators/atRootare lead-position + YAML-document-only;rawBlock'sintroChar/signature/glue are trailing-only). A unified type would make every field buttokenmode-exclusive dead weight and force adopters to read YAML vocabulary. At most extract the bare skeleton internally; keep two focused, mode-neutral-named fields.Acceptance
yaml.tsstays byte-identical (YAML gates +scope-gap:yaml).string: true/ ablockPatternwithout being enlisted into YAML separator/fold semantics.gen-tmagree on the key/value separator for anykeyValueSeparatorvalue.profilepreset added.