Skip to content

[v1.37] Tokenization tutorial + query-profile docs: Java and TypeScript snippets#411

Open
g-despot wants to merge 6 commits intomainfrom
1-37/ramining
Open

[v1.37] Tokenization tutorial + query-profile docs: Java and TypeScript snippets#411
g-despot wants to merge 6 commits intomainfrom
1-37/ramining

Conversation

@g-despot
Copy link
Copy Markdown
Contributor

Summary

Adds the Java v6 and TypeScript ports for the v1.37 tokenization tutorial (Examples 4–6) and the query-profile how-to page. Both pages were Python-only at PR #398 / #402 time because the Java and TS clients didn't yet expose the v1.37 surface; that's resolved now (Java 6.2.1-SNAPSHOT, TS weaviate/typescript-client#429). Also bumps the test compose stack to Weaviate 1.37.2 — the version where the /v1/tokenize stopwordPresets request shape became Map<string, []string>, matching what both clients send naturally.

Tokenization tutorial (docs/weaviate/tutorials/tokenization.md)

  • Examples 4 (accent folding), 5 (custom stopwords), 6 (tokenize endpoint) now have full Python / Java v6 / TypeScript tabs
  • New TS snippets at _includes/code/tutorials/tokenization/{accent_folding,custom_stopwords,tokenize_endpoint}.ts mirroring the Python markers
  • Combined Java test class at _includes/code/java-v6/src/test/java/TokenizationTest.java (5 @Test methods covering all three examples plus the forProperty variant)

Query profiling (docs/weaviate/search/query-profile.md)

  • Three FilteredTextBlock sections wrapped in language tabs (Python / TS / Java v6)
  • New _includes/code/howto/search.profile.ts and _includes/code/java-v6/src/test/java/SearchProfileTest.java. The latter waits 3 s after insertMany for ASYNC_INDEXING=true to build the HNSW graph — without the wait, nearVector returns no objects and the server skips populating queryProfile.shards.

Test harness

  • tests/test_typescript.py — new test_tokenization and test_search_profile parametrize blocks
  • tests/test_java_v6.pyTokenizationTest and SearchProfileTest added
  • Compose files bumped from 1.37.11.37.2

Client dependency notes

  • _includes/code/java-v6/pom.xml pinned to client6:6.2.1-SNAPSHOT — needs the camelCase TextAnalyzer @SerializedName fix that's not in released 6.2.0. Will switch back to a release version once client6:6.2.1 ships.
  • _includes/code/package.json adds weaviate-client (version managed via weaviate/typescript-client#429 SNAPSHOT until released).

Test plan

  • uv run pytest -m java_v6 -k "Tokenization or SearchProfile" -v — 2 passed (TokenizationTest's 5 sub-tests + SearchProfileTest's 3 sub-tests all green)
  • uv run pytest -m ts -k "tokenization or search_profile" -v — 4 passed (accent_folding.ts, custom_stopwords.ts, tokenize_endpoint.ts, search.profile.ts)
  • npx tsx _includes/code/tutorials/tokenization/{accent_folding,custom_stopwords,tokenize_endpoint}.ts — direct runs match the Python output verbatim
  • mvn test -Dtest=TokenizationTest and -Dtest=SearchProfileTest — green
  • Live page render of /weaviate/tutorials/tokenization and /weaviate/search/query-profile — Python / TypeScript / Java v6 tabs render and code is highlighted correctly

Note on commits

This branch is currently based on digital-ocean (one commit). The Add DigitalOcean deployment type commit will collapse out of this PR's diff automatically once that branch merges to main; if you'd prefer, I can rebase onto main once the DigitalOcean PR lands.

🤖 Generated with Claude Code

Copy link
Copy Markdown

@orca-security-eu orca-security-eu Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Failed Failed Vulnerabilities high 2   medium 0   low 0   info 0 View in Orca
☢️ The following Vulnerabilities (CVEs) have been detected
PACKAGE FILE CVE ID INSTALLED VERSION FIXED VERSION
high tar ./yarn.lock CVE-2026-24842 6.2.1 7.5.7 View in code
high tar ./yarn.lock CVE-2026-26960 6.2.1 7.5.8 View in code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant