Skip to content

Use json preprocessor for new-line delimited JSON files#19862

Merged
RobinMalfait merged 2 commits intotailwindlabs:mainfrom
thecrypticace:fix/jsonl-slow-scanning
Mar 26, 2026
Merged

Use json preprocessor for new-line delimited JSON files#19862
RobinMalfait merged 2 commits intotailwindlabs:mainfrom
thecrypticace:fix/jsonl-slow-scanning

Conversation

@thecrypticace
Copy link
Contributor

Summary

This specializes the .jsonl and .ndjson file extensions so they're preprocessed like JSON instead of by the standard scanner. This prevents them from creating thousands of sub machines and reduces scanning time (see #17125 where this was done for .json files).

It seems reasonable to handle new-line delimited JSON files as well otherwise scanning these files can take quite a long time.

It's quite unlikely that these will contain classes so, alternatively, these could go in the binary extensions list so they get ignored entirely.

Test plan

I ran manual tests inside the oxide crate against some large-ish JSONL files (5MB–15MB). These changes bring down scanning time from 2s–3s on my M3 Max (via cargo test --release …) to less than 20ms.

I also ran tests through a full CLI build pipeline on a low-spec linux box. This change brought scanning time down from ~90s to ~300ms for a single ~15MB file.

Otherwise scanning these files can take quite a long time
@thecrypticace thecrypticace requested a review from a team as a code owner March 26, 2026 16:32
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 26, 2026

Walkthrough

The pull request updates the CHANGELOG to document a performance improvement for scanning JSONL and NDJSON files. The corresponding code change modifies the pre_process_input function in the scanner module to recognize jsonl and ndjson file extensions, treating them with the same JSON processing logic previously reserved for .json files. This extends the JSON preprocessor's application to additional file formats that use JSON-based structures.

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: using the JSON preprocessor for newline-delimited JSON files (.jsonl and .ndjson extensions).
Description check ✅ Passed The description is directly related to the changeset, explaining the motivation for treating .jsonl and .ndjson files as JSON and providing concrete performance improvements from testing.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
crates/oxide/src/scanner/mod.rs (1)

477-477: Add JSONL/NDJSON regression tests for this new extension routing.

This mapping change is correct, but there’s no visible test coverage for newline-delimited inputs (e.g. multiple JSON objects separated by \n). Please add scanner/preprocessor tests for both .jsonl and .ndjson to lock in this behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/oxide/src/scanner/mod.rs` at line 477, Add regression tests that
exercise the new extension routing for "jsonl" and "ndjson" by creating
scanner/preprocessor unit tests which call the code path that leads to
Json.process; feed newline-delimited input (multiple JSON objects separated by
'\n') and assert the output matches the expected sequence of parsed JSON objects
(or that the Json.process result is identical to calling the JSON processor
directly). Ensure tests cover both ".jsonl" and ".ndjson" extension routing and
include cases with trailing newline and empty lines to lock in behavior for
Json.process.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@crates/oxide/src/scanner/mod.rs`:
- Line 477: Add regression tests that exercise the new extension routing for
"jsonl" and "ndjson" by creating scanner/preprocessor unit tests which call the
code path that leads to Json.process; feed newline-delimited input (multiple
JSON objects separated by '\n') and assert the output matches the expected
sequence of parsed JSON objects (or that the Json.process result is identical to
calling the JSON processor directly). Ensure tests cover both ".jsonl" and
".ndjson" extension routing and include cases with trailing newline and empty
lines to lock in behavior for Json.process.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 51e2e8e3-892d-43ba-a051-c31a638da07f

📥 Commits

Reviewing files that changed from the base of the PR and between df6209a and 8dc9031.

📒 Files selected for processing (2)
  • CHANGELOG.md
  • crates/oxide/src/scanner/mod.rs

Copy link
Member

@RobinMalfait RobinMalfait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, thanks!

@RobinMalfait RobinMalfait merged commit d7fc281 into tailwindlabs:main Mar 26, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants