Skip to content

chore: explicitly set unicode-segmentation version to 1.13#6463

Merged
Mallets merged 1 commit into
mainfrom
mallets/dep-update-unicode-segmentation
May 22, 2026
Merged

chore: explicitly set unicode-segmentation version to 1.13#6463
Mallets merged 1 commit into
mainfrom
mallets/dep-update-unicode-segmentation

Conversation

@Mallets
Copy link
Copy Markdown
Contributor

@Mallets Mallets commented May 22, 2026

Description

Bump unicode-segmentation from "1.12" to "1.13" in the workspace Cargo.toml.

Cargo.lock already resolves to 1.13.2, so this change only aligns the minimum version specifier -- no actual dependency version change at build time.

Why

unicode-segmentation 1.13.2 introduces an ASCII fast path for unicode_word_indices and unicode_words (unicode-rs/unicode-segmentation#147).
Quickwit's UnicodeSegmenterTokenizer calls text.unicode_word_indices() directly, so this improvement benefits tokenization performance on ASCII-heavy text, which is the common case for most deployments.

Other notable changes in the 1.13.x series:

  • Increased #[inline] opportunities (15-40% general perf improvement)
  • Unicode 17.0.0 support

How was this PR tested?

Version-only change in Cargo.toml. Cargo.lock is unchanged, so the resolved dependency is identical to what was already being built.

…oml. Cargo.lock already points to 1.13.2, which introduces significant performance improvement
@Mallets Mallets self-assigned this May 22, 2026
@Mallets Mallets marked this pull request as ready for review May 22, 2026 13:54
@Mallets Mallets requested a review from a team as a code owner May 22, 2026 13:54
@Mallets Mallets merged commit dd020e0 into main May 22, 2026
9 checks passed
@Mallets Mallets deleted the mallets/dep-update-unicode-segmentation branch May 22, 2026 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants