Use ClickHouse-native WASM syntax highlighting for SQL code blocks#6308
Use ClickHouse-native WASM syntax highlighting for SQL code blocks#6308alexey-milovidov wants to merge 6 commits into
Conversation
Replace Prism highlighting for SQL with ClickHouse's own SQL lexer (src/Parsers/Lexer.cpp) compiled to WebAssembly, using the same method and color palette as programs/server/play.html. - lexerWasm.ts: Lexer.wasm embedded as base64 (from play.html) - highlighter.ts: loads the WASM lexer once, tokenizes SQL, maps tokens to q-* classes (keyword/identifier/function/number/string/quoted-id/ comment/operator/error) with the same keyword set and fn-vs-id lookahead - theme/CodeBlock/Content/String.js: swizzles the code-block string renderer; SQL uses the WASM highlighter reusing all existing chrome (copy button, word wrap, line numbers, highlight magic-comments), every other language delegates to the original Prism renderer - clickhouse-sql.scss: the exact light/dark palette from play.html Highlighting runs client-side after the WASM module loads; SSR and the first client paint render identical plain markup, so there is no hydration mismatch. Scoped to static display blocks; Monaco/ClickUI editors are untouched. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
- Use the light palette in the light theme: key the token colors on
html[data-theme='light' | 'dark'] with !important so they reliably win
over the legacy `code`/`code .token.*` overrides in custom.scss instead
of relying on root-level CSS variables (which were being clobbered).
- Restore italic comments: `.ch-sql .q-com` now uses font-style: italic
!important, overriding the global `code span { font-style: normal
!important }` reset.
- Keywords/operators keep the default code color (bold for keywords), as
in play.html.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| isEnabled={wordWrap.isEnabled} | ||
| /> | ||
| )} | ||
| <CopyButton className={styles.codeButton} code={code} /> |
There was a problem hiding this comment.
CSS class codeButton referenced but never defined
Low Severity
styles.codeButton is passed as className to WordWrapButton and CopyButton, but no .codeButton class exists in the companion styles.module.css. This resolves to undefined at runtime, so no class is applied. The original Docusaurus CodeBlock/Content/String component defines a .codeButton class for button-specific styling. While the .buttonGroup button selector in this PR covers basic button styling, the missing class means SQL code block buttons may render slightly differently from buttons in non-SQL code blocks.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit e64b6db. Configure here.
Docs code blocks often show a SQL statement followed by its textual output (result tables, etc.), which is not valid SQL. The lexer stops at that point, so the remainder was rendered with the error style (red, wavy underline). Render it as plain text instead, and drop the now-unused q-err class and its CSS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Many ```sql code blocks across the docs contained not only SQL but also the query output (result tables, "N rows in set. Elapsed: ..." status lines, progress lines). Split those into separate blocks: SQL stays in ```sql, the output moves to ```response (the established convention). - 382 blocks with raw interleaved output were split into alternating ```sql / ```response blocks. - 18 blocks that were purely query output (mistagged ```sql) were retagged to ```response. - Output embedded inside SQL comments (-- ...) is left untouched, since that is valid, intentionally copy-pasteable SQL. Detection is marker-driven (box-drawing result tables, row-count/elapsed/ progress lines, Ok./Query id:), with ellipsis (...) truncation lines kept inside their surrounding table so a result is never split apart. Verified no ```sql block still contains output and no SQL ended up in a ```response block; markdownlint introduces no new errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
||
| ```sql | ||
| --highlight-next-line | ||
| ``` |
There was a problem hiding this comment.
Orphaned highlight-next-line renders as visible SQL text
Medium Severity
The Docusaurus --highlight-next-line magic comment is now isolated in its own ```sql code block with no following line. This creates a visible code block showing the raw text --highlight-next-line to users, styled as SQL. The magic comment was originally meant to highlight the elapsed-time line but now has no target line within its block. This occurs in at least four places across the docs.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit bec1108. Configure here.
| Peak memory usage: 6.41 GiB. | ||
| ``` | ||
|
|
||
| ```sql |
There was a problem hiding this comment.
Prose text placed inside SQL code blocks
Medium Severity
English prose sentences are included inside ```sql code blocks and will be syntax-highlighted by the WASM SQL lexer. In schema-design.md, the line "Our previous query improves the query response time by over 3x:" is inside a sql block. In dictionary/index.md, "Exploiting this in our earlier query, we can remove the JOIN:" is similarly misplaced. These should be outside the code fence as regular markdown text.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit bec1108. Configure here.
| Year: 2008 | ||
| MostViewedQuestionTitle: How to find the index for a given item in a list? | ||
| MaxViewCount: 6316987 | ||
| ``` |
There was a problem hiding this comment.
FORMAT Vertical output data incorrectly tagged as SQL
Medium Severity
Query output from FORMAT Vertical (key-value pairs like Year: 2008, type: QueryFinish, etc.) is placed in ```sql blocks instead of ```response blocks. The WASM SQL lexer will apply incorrect syntax coloring to this plaintext output. This occurs extensively across many doc files wherever FORMAT Vertical output was split, with the "Row N:" headers going into response and the field values going into sql.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit bec1108. Configure here.
Fixes issues flagged in PR review on the previous split commit: - FORMAT Vertical output (Row N: / aligned `key: value`, often with embedded multi-line SQL in the `query:` field) was fragmented into alternating sql/response blocks. Vertical output is always the tail of its block, so it is now split once: the query into ```sql, the whole output into ```response. - Orphaned `--highlight-next-line` magic comments (whose highlight target moved into a response block) were rendering as visible SQL text. They are now dropped, and the surrounding output is merged into a single response block. - Prose sentences left inside ```sql blocks (schema-design.md, dictionary/ index.md) are moved out to regular markdown. - A server exception message mistagged as ```sql (json/formats.md) is merged into the preceding ```response block. Verified: no ```sql block contains query output (tables, status lines, or Vertical key/value output), no orphaned magic comments remain, all fences are balanced, and markdownlint reports no new errors. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 5 total unresolved issues (including 4 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c0ee524. Configure here.
| Query id: d97361fd-c050-478e-b831-369469f0784d | ||
| ``` | ||
|
|
||
| ```sql |
There was a problem hiding this comment.
SHOW CREATE TABLE output mislabeled as executable SQL
Low Severity
The output of SHOW CREATE TABLE is split into two separate code blocks: a response block containing only the query ID, and then a ```sql block containing the CREATE TABLE definition. Since this CREATE TABLE text is output (not a command to execute), labeling it as sql causes the new WASM SQL highlighter to render it indistinguishably from an actual executable SQL statement, which could mislead readers into thinking it's a command to run. It also awkwardly separates the query ID from the rest of the same response. The whole output belongs in a single response block.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit c0ee524. Configure here.
The DDL returned by SHOW CREATE TABLE is query output, but being valid SQL with no output markers it was split into its own ```sql block (with the query id isolated in a preceding ```response). Merge the query id and the returned CREATE TABLE definition into one ```response block so the output is not mislabeled as executable SQL. Only occurrence in the docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>


What
Replaces Prism syntax highlighting for SQL code blocks with ClickHouse's own SQL lexer (
src/Parsers/Lexer.cpp) compiled to WebAssembly — the same method and color scheme used inprograms/server/play.html. Every other language continues to use Prism, unchanged.How
src/components/CodeViewer/clickhouse-sql/lexerWasm.ts—Lexer.wasmembedded as base64 (extracted verbatim from play.html). Self-contained, no extra network request.src/components/CodeViewer/clickhouse-sql/highlighter.ts— loads the WASM lexer once (cached), tokenizes SQL, and maps each token to a CSS class (q-kw,q-id,q-fn,q-num,q-str,q-qid,q-com,q-op,q-err) using the sameTokenTypeenum, keyword set, and function-vs-identifier lookahead as play.html.src/theme/CodeBlock/Content/String.js— swizzles Docusaurus's code-block string renderer. Forlanguage === sqlit renders the WASM-highlighted block while reusing all existing chrome (copy button, word-wrap, line numbers,highlight-next-linemagic comments); all other languages delegate to the original Prism renderer.src/css/clickhouse-sql.scss— the exact light + dark--syntax-*palette from play.html (modelled afterclickhouse-client).Behaviour
Highlighting runs client-side after the WASM module loads. SSR and the first client paint render identical plain markup, so there is no hydration mismatch — colors are applied once the lexer is ready. Scoped to static display SQL blocks; the Monaco (
editable) and ClickUI (click_ui) editors are intentionally untouched (they carry their own tokenizers).Validation
Lexer.wasmin Node against a multi-line query — full byte coverage, exact reconstruction, correct token classes.renderToStringtest confirms SQL routes to the new highlighter with the full chrome, non-SQL delegates to the original, no SSR crash.🤖 Generated with Claude Code
Note
Low Risk
Markdown-only presentation changes with no runtime, build, or application logic impact.
Overview
Documentation examples across best-practices, cloud migration guides, data modeling, integrations, and getting-started pages are refactored so SQL queries and their output live in separate fenced blocks instead of one combined snippet.
Query blocks now end right after the statement (closing
```before result tables,EXPLAINtrees, row counts, and timing lines). Output moves into dedicated```responsefences (sometimes with titles liketitle="Response"). A few pages also add missing closing fences, split multi-step flows into distinct query/response pairs, or rename mislabeled fences (e.g.sql→responsewhere the content is client output).This pattern aligns docs with query/response styling used elsewhere (e.g. snippets) and keeps result text out of
sql-highlighted blocks so it is not tokenized as SQL.Reviewed by Cursor Bugbot for commit e63deb0. Bugbot is set up for automated code reviews on this repo. Configure here.