Use json preprocessor for new-line delimited JSON files#19862
Use json preprocessor for new-line delimited JSON files#19862RobinMalfait merged 2 commits intotailwindlabs:mainfrom
json preprocessor for new-line delimited JSON files#19862Conversation
Otherwise scanning these files can take quite a long time
WalkthroughThe pull request updates the CHANGELOG to document a performance improvement for scanning JSONL and NDJSON files. The corresponding code change modifies the 🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
crates/oxide/src/scanner/mod.rs (1)
477-477: Add JSONL/NDJSON regression tests for this new extension routing.This mapping change is correct, but there’s no visible test coverage for newline-delimited inputs (e.g. multiple JSON objects separated by
\n). Please add scanner/preprocessor tests for both.jsonland.ndjsonto lock in this behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@crates/oxide/src/scanner/mod.rs` at line 477, Add regression tests that exercise the new extension routing for "jsonl" and "ndjson" by creating scanner/preprocessor unit tests which call the code path that leads to Json.process; feed newline-delimited input (multiple JSON objects separated by '\n') and assert the output matches the expected sequence of parsed JSON objects (or that the Json.process result is identical to calling the JSON processor directly). Ensure tests cover both ".jsonl" and ".ndjson" extension routing and include cases with trailing newline and empty lines to lock in behavior for Json.process.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@crates/oxide/src/scanner/mod.rs`:
- Line 477: Add regression tests that exercise the new extension routing for
"jsonl" and "ndjson" by creating scanner/preprocessor unit tests which call the
code path that leads to Json.process; feed newline-delimited input (multiple
JSON objects separated by '\n') and assert the output matches the expected
sequence of parsed JSON objects (or that the Json.process result is identical to
calling the JSON processor directly). Ensure tests cover both ".jsonl" and
".ndjson" extension routing and include cases with trailing newline and empty
lines to lock in behavior for Json.process.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 51e2e8e3-892d-43ba-a051-c31a638da07f
📒 Files selected for processing (2)
CHANGELOG.mdcrates/oxide/src/scanner/mod.rs
Summary
This specializes the
.jsonland.ndjsonfile extensions so they're preprocessed like JSON instead of by the standard scanner. This prevents them from creating thousands of sub machines and reduces scanning time (see #17125 where this was done for.jsonfiles).It seems reasonable to handle new-line delimited JSON files as well otherwise scanning these files can take quite a long time.
It's quite unlikely that these will contain classes so, alternatively, these could go in the binary extensions list so they get ignored entirely.
Test plan
I ran manual tests inside the
oxidecrate against some large-ish JSONL files (5MB–15MB). These changes bring down scanning time from 2s–3s on my M3 Max (viacargo test --release …) to less than 20ms.I also ran tests through a full CLI build pipeline on a low-spec linux box. This change brought scanning time down from ~90s to ~300ms for a single ~15MB file.