Skip to content

Decouple the generic indent core from the YAML profile #44

@johnsoncodehk

Description

@johnsoncodehk

Context

PR #41 added three opt-in IndentConfig knobs (commentExcept, rawBlock, flowColonSeparator) so a non-YAML indentation language (NMBL — a Pug-shaped HTML shorthand) can opt out of behaviors the indent mode had baked in for YAML. The PR is correct and merged. Its "field notes", plus a follow-up audit of src/gen-lexer.ts, show the YAML coupling is broader than those three knobs: YAML semantics ride on generic token flags (string, blockPattern) and on hardcoded literals in the lexer, with no opt-out except mis-declaring the grammar.

This issue tracks decoupling the generic indent / newline / flow core from the YAML profile. The seam is already named in the code: NewlineConfig is a genuinely generic line+flow core, and types.ts literally comments "indent = newline + indent stack + YAML block-scalar semantics." The goal is to make that comment true in code — so any future indentation client composes the core instead of fighting YAML.

Root cause

A non-YAML indentation grammar inherits YAML behaviors it never asked for, because those behaviors are derived from flags/literals that mean something else. Verified sites (all src/gen-lexer.ts unless noted):

(A) Flow : key/value separator membership is derived from the string flag.
stringTokenNames = tokens.filter(t => t.string) (:263) feeds the flow-: carve-out (:987). So flagging a token string: true — whose legitimate jobs are string-region scoping and auto-close-delimiter derivation — silently enlists it into YAML's key: value separator emission. The adopter had to drop string: true to escape this, and lost auto-close delimiter derivation. flowColonSeparator: false (PR #41) only gates the carve-out wholesale; it does not let a token opt out individually, nor does it un-overload the flag.

(B) Plain-scalar continuation folding is derived from blockPattern.
plainScalarTokenNames / plainContinuationTokenName (:270, :277) are derived from any token with a blockPattern, and drive three fold sites (:715 block-context, :1224-1241 flow illegal-head, :1292-1327 flow multi-line merge). Any blockPattern token gets YAML plain-scalar folding, with no opt-out.

(C) keyValueSeparator is half-wired (parser ↔ highlighter split, latent).
keyValueSeparator is declared (types.ts:318) and honored by the highlighter generator in several places (gen-tm.ts:3314, 4928, 5188, 5393). The lexer never reads it and instead hardcodes : in seven key-separator decisions (:329, :386, :397, :538, :767, :985, :1149). These lexer sniffs (lineHasKeySeparator, startsBlockStructuralNode, the flow carve-out, compact-key pairing) make the same "is this a mapping-key line" decision the highlighter makes from keyValueSeparator — so the field name reads like a general knob but only half the pipeline obeys it. Latent today (the only indent grammar, yaml.ts:630, sets it to :, matching the hardcode), but a future indent grammar that sets a different separator gets a highlighter built around its separator and a parser still keyed on : — which violates the project's parser↔highlighter-consistency-by-construction guarantee. (compactIndicators is in the same family: read by the lexer at :359/:1155 but also hardcoded as -/? in the tab/value sniffs at :538/:767.)

(D) §6.1 tab-in-indentation errors are gated only on !!indent.
The YAML-spec tab errors throw at :680 and :765 via startsBlockStructuralNode (:383-400), which hardcodes YAML indicators -/?/:/&/!. Any indentation grammar inherits YAML tab rejection; it is not a knob.

(E) blockScalar and PR #41's rawBlock are two near-mirror loops for one concept — an "indentation-bounded verbatim region" (:827-905 vs the rawBlock loop). Shared behavior is the bare "capture following lines with col > parent as one token" skeleton; everything that decides body start/stop differs (see note under Non-goals).

Proposed direction

Priority order. PRIMARY items are the actual correctness defects and the real blocker for non-YAML adopters:

  1. Un-overload (A) and (B). Source the flow-: separator membership and the plain-scalar fold from their own explicit config, not from the general-purpose string / blockPattern flags. This is the direct fix for the two field notes and the only change that lets a non-YAML grammar keep semantically-correct string / blockPattern flags. (Un-overload, not "bolt an opt-out onto an overloaded flag.")
  2. Route the lexer's key-separator sniffs through keyValueSeparator (and ideally compactIndicators) so the lexer and gen-tm share one source of truth for the separator. Fixes (C); standalone latent bug worth fixing regardless.
  3. Flip the indent-core defaults to neutral (no §6.1 tab errors, no flow-: carve-out, no plain-fold unless declared) and have yaml.ts opt into the YAML behaviors via explicit fields — it already declares blockScalar / compactIndicators / tagScope explicitly, so this is the same pattern. Proof obligation: yaml.ts tokenizes byte-identically (existing YAML gates + scope-gap:yaml). (D)'s tab-error split additionally needs startsBlockStructuralNode's indicator set parameterized, not just a boolean gate — larger sub-task.

Non-goals

  • No named profile: "yaml" preset. With a single in-repo indent client it is premature, and it bundles/hides the coupling behind a switch rather than removing it. Explicit per-field declaration in yaml.ts achieves byte-identity without the indirection.
  • No public merge of blockScalar and rawBlock. They share only the capture skeleton; their public knobs barely overlap (introducers/documentMarkers/indicatorScope/|N indicators/atRoot are lead-position + YAML-document-only; rawBlock's introChar/signature/glue are trailing-only). A unified type would make every field but token mode-exclusive dead weight and force adopters to read YAML vocabulary. At most extract the bare skeleton internally; keep two focused, mode-neutral-named fields.

Acceptance

  • yaml.ts stays byte-identical (YAML gates + scope-gap:yaml).
  • A non-YAML indent grammar can set string: true / a blockPattern without being enlisted into YAML separator/fold semantics.
  • The lexer and gen-tm agree on the key/value separator for any keyValueSeparator value.
  • No profile preset added.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions