Skip to content

Use ClickHouse-native WASM syntax highlighting for SQL code blocks#6308

Open
alexey-milovidov wants to merge 6 commits into
mainfrom
clickhouse-native-sql-highlighting
Open

Use ClickHouse-native WASM syntax highlighting for SQL code blocks#6308
alexey-milovidov wants to merge 6 commits into
mainfrom
clickhouse-native-sql-highlighting

Conversation

@alexey-milovidov
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov commented May 31, 2026

What

Replaces Prism syntax highlighting for SQL code blocks with ClickHouse's own SQL lexer (src/Parsers/Lexer.cpp) compiled to WebAssembly — the same method and color scheme used in programs/server/play.html. Every other language continues to use Prism, unchanged.

How

  • src/components/CodeViewer/clickhouse-sql/lexerWasm.tsLexer.wasm embedded as base64 (extracted verbatim from play.html). Self-contained, no extra network request.
  • src/components/CodeViewer/clickhouse-sql/highlighter.ts — loads the WASM lexer once (cached), tokenizes SQL, and maps each token to a CSS class (q-kw, q-id, q-fn, q-num, q-str, q-qid, q-com, q-op, q-err) using the same TokenType enum, keyword set, and function-vs-identifier lookahead as play.html.
  • src/theme/CodeBlock/Content/String.js — swizzles Docusaurus's code-block string renderer. For language === sql it renders the WASM-highlighted block while reusing all existing chrome (copy button, word-wrap, line numbers, highlight-next-line magic comments); all other languages delegate to the original Prism renderer.
  • src/css/clickhouse-sql.scss — the exact light + dark --syntax-* palette from play.html (modelled after clickhouse-client).

Behaviour

Highlighting runs client-side after the WASM module loads. SSR and the first client paint render identical plain markup, so there is no hydration mismatch — colors are applied once the lexer is ready. Scoped to static display SQL blocks; the Monaco (editable) and ClickUI (click_ui) editors are intentionally untouched (they carry their own tokenizers).

Validation

  • WASM pipeline (end-to-end): ran the real Lexer.wasm in Node against a multi-line query — full byte coverage, exact reconstruction, correct token classes.
  • SSR + routing: isolated renderToString test confirms SQL routes to the new highlighter with the full chrome, non-SQL delegates to the original, no SSR crash.
  • All new source files parse cleanly.

A full local docusaurus build couldn't be run in the dev sandbox (it requires the network-heavy copy-clickhouse-repo-docs prep step before content loads). Recommend a quick local yarn start after the normal prep to eyeball the colors before merging.

Note: Lexer.wasm is a snapshot from play.html — if ClickHouse's lexer or keyword list changes upstream, re-extract the base64 and sync the keyword set in highlighter.ts.

🤖 Generated with Claude Code


Note

Low Risk
Markdown-only presentation changes with no runtime, build, or application logic impact.

Overview
Documentation examples across best-practices, cloud migration guides, data modeling, integrations, and getting-started pages are refactored so SQL queries and their output live in separate fenced blocks instead of one combined snippet.

Query blocks now end right after the statement (closing ``` before result tables, EXPLAIN trees, row counts, and timing lines). Output moves into dedicated ```response fences (sometimes with titles like title="Response"). A few pages also add missing closing fences, split multi-step flows into distinct query/response pairs, or rename mislabeled fences (e.g. sqlresponse where the content is client output).

This pattern aligns docs with query/response styling used elsewhere (e.g. snippets) and keeps result text out of sql-highlighted blocks so it is not tokenized as SQL.

Reviewed by Cursor Bugbot for commit e63deb0. Bugbot is set up for automated code reviews on this repo. Configure here.

Replace Prism highlighting for SQL with ClickHouse's own SQL lexer
(src/Parsers/Lexer.cpp) compiled to WebAssembly, using the same method
and color palette as programs/server/play.html.

- lexerWasm.ts: Lexer.wasm embedded as base64 (from play.html)
- highlighter.ts: loads the WASM lexer once, tokenizes SQL, maps tokens
  to q-* classes (keyword/identifier/function/number/string/quoted-id/
  comment/operator/error) with the same keyword set and fn-vs-id lookahead
- theme/CodeBlock/Content/String.js: swizzles the code-block string
  renderer; SQL uses the WASM highlighter reusing all existing chrome
  (copy button, word wrap, line numbers, highlight magic-comments), every
  other language delegates to the original Prism renderer
- clickhouse-sql.scss: the exact light/dark palette from play.html

Highlighting runs client-side after the WASM module loads; SSR and the
first client paint render identical plain markup, so there is no
hydration mismatch. Scoped to static display blocks; Monaco/ClickUI
editors are untouched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov requested a review from a team as a code owner May 31, 2026 02:43
@vercel
Copy link
Copy Markdown

vercel Bot commented May 31, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clickhouse-docs Ready Ready Preview, Comment May 31, 2026 6:47am
clickhouse-docs-jp Building Building Preview, Comment May 31, 2026 6:47am
3 Skipped Deployments
Project Deployment Actions Updated (UTC)
clickhouse-docs-ko Ignored Ignored Preview May 31, 2026 6:47am
clickhouse-docs-ru Ignored Ignored Preview May 31, 2026 6:47am
clickhouse-docs-zh Ignored Ignored Preview May 31, 2026 6:47am

Request Review

Comment thread src/components/CodeViewer/clickhouse-sql/highlighter.ts
- Use the light palette in the light theme: key the token colors on
  html[data-theme='light' | 'dark'] with !important so they reliably win
  over the legacy `code`/`code .token.*` overrides in custom.scss instead
  of relying on root-level CSS variables (which were being clobbered).
- Restore italic comments: `.ch-sql .q-com` now uses font-style: italic
  !important, overriding the global `code span { font-style: normal
  !important }` reset.
- Keywords/operators keep the default code color (bold for keywords), as
  in play.html.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
isEnabled={wordWrap.isEnabled}
/>
)}
<CopyButton className={styles.codeButton} code={code} />
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSS class codeButton referenced but never defined

Low Severity

styles.codeButton is passed as className to WordWrapButton and CopyButton, but no .codeButton class exists in the companion styles.module.css. This resolves to undefined at runtime, so no class is applied. The original Docusaurus CodeBlock/Content/String component defines a .codeButton class for button-specific styling. While the .buttonGroup button selector in this PR covers basic button styling, the missing class means SQL code block buttons may render slightly differently from buttons in non-SQL code blocks.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit e64b6db. Configure here.

Docs code blocks often show a SQL statement followed by its textual
output (result tables, etc.), which is not valid SQL. The lexer stops at
that point, so the remainder was rendered with the error style (red,
wavy underline). Render it as plain text instead, and drop the now-unused
q-err class and its CSS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Many ```sql code blocks across the docs contained not only SQL but also
the query output (result tables, "N rows in set. Elapsed: ..." status
lines, progress lines). Split those into separate blocks: SQL stays in
```sql, the output moves to ```response (the established convention).

- 382 blocks with raw interleaved output were split into alternating
  ```sql / ```response blocks.
- 18 blocks that were purely query output (mistagged ```sql) were
  retagged to ```response.
- Output embedded inside SQL comments (-- ...) is left untouched, since
  that is valid, intentionally copy-pasteable SQL.

Detection is marker-driven (box-drawing result tables, row-count/elapsed/
progress lines, Ok./Query id:), with ellipsis (...) truncation lines kept
inside their surrounding table so a result is never split apart. Verified
no ```sql block still contains output and no SQL ended up in a ```response
block; markdownlint introduces no new errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov alexey-milovidov requested a review from a team as a code owner May 31, 2026 04:31

```sql
--highlight-next-line
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orphaned highlight-next-line renders as visible SQL text

Medium Severity

The Docusaurus --highlight-next-line magic comment is now isolated in its own ```sql code block with no following line. This creates a visible code block showing the raw text --highlight-next-line to users, styled as SQL. The magic comment was originally meant to highlight the elapsed-time line but now has no target line within its block. This occurs in at least four places across the docs.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit bec1108. Configure here.

Comment thread docs/data-modeling/schema-design.md Outdated
Peak memory usage: 6.41 GiB.
```

```sql
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prose text placed inside SQL code blocks

Medium Severity

English prose sentences are included inside ```sql code blocks and will be syntax-highlighted by the WASM SQL lexer. In schema-design.md, the line "Our previous query improves the query response time by over 3x:" is inside a sql block. In dictionary/index.md, "Exploiting this in our earlier query, we can remove the JOIN:" is similarly misplaced. These should be outside the code fence as regular markdown text.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit bec1108. Configure here.

Year: 2008
MostViewedQuestionTitle: How to find the index for a given item in a list?
MaxViewCount: 6316987
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FORMAT Vertical output data incorrectly tagged as SQL

Medium Severity

Query output from FORMAT Vertical (key-value pairs like Year: 2008, type: QueryFinish, etc.) is placed in ```sql blocks instead of ```response blocks. The WASM SQL lexer will apply incorrect syntax coloring to this plaintext output. This occurs extensively across many doc files wherever FORMAT Vertical output was split, with the "Row N:" headers going into response and the field values going into sql.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit bec1108. Configure here.

Fixes issues flagged in PR review on the previous split commit:

- FORMAT Vertical output (Row N: / aligned `key: value`, often with embedded
  multi-line SQL in the `query:` field) was fragmented into alternating
  sql/response blocks. Vertical output is always the tail of its block, so it
  is now split once: the query into ```sql, the whole output into ```response.
- Orphaned `--highlight-next-line` magic comments (whose highlight target moved
  into a response block) were rendering as visible SQL text. They are now
  dropped, and the surrounding output is merged into a single response block.
- Prose sentences left inside ```sql blocks (schema-design.md, dictionary/
  index.md) are moved out to regular markdown.
- A server exception message mistagged as ```sql (json/formats.md) is merged
  into the preceding ```response block.

Verified: no ```sql block contains query output (tables, status lines, or
Vertical key/value output), no orphaned magic comments remain, all fences are
balanced, and markdownlint reports no new errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.

There are 5 total unresolved issues (including 4 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c0ee524. Configure here.

Query id: d97361fd-c050-478e-b831-369469f0784d
```

```sql
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHOW CREATE TABLE output mislabeled as executable SQL

Low Severity

The output of SHOW CREATE TABLE is split into two separate code blocks: a response block containing only the query ID, and then a ```sql block containing the CREATE TABLE definition. Since this CREATE TABLE text is output (not a command to execute), labeling it as sql causes the new WASM SQL highlighter to render it indistinguishably from an actual executable SQL statement, which could mislead readers into thinking it's a command to run. It also awkwardly separates the query ID from the rest of the same response. The whole output belongs in a single response block.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c0ee524. Configure here.

The DDL returned by SHOW CREATE TABLE is query output, but being valid SQL
with no output markers it was split into its own ```sql block (with the
query id isolated in a preceding ```response). Merge the query id and the
returned CREATE TABLE definition into one ```response block so the output
is not mislabeled as executable SQL. Only occurrence in the docs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant