Skip to content

refactor: upgrade HBase and replace custom hbase-shaded-endpint#3021

Open
vaijosh wants to merge 5 commits into
apache:masterfrom
vaijosh:Hbase-265-upgrade
Open

refactor: upgrade HBase and replace custom hbase-shaded-endpint#3021
vaijosh wants to merge 5 commits into
apache:masterfrom
vaijosh:Hbase-265-upgrade

Conversation

@vaijosh
Copy link
Copy Markdown

@vaijosh vaijosh commented May 11, 2026

Title

fixes #3016

feat(hbase): upgrade to HBase 2.6.5 and replace custom shaded endpoint with official Apache artifacts

Background

This PR modernizes HugeGraph’s HBase integration by replacing the custom hbase-shaded-endpoint dependency with official Apache HBase 2.6.5 artifacts, and adds a reproducible Docker-based local test environment for HBase backend development/verification.

What changed

1) HBase dependency upgrade (hugegraph-server/hugegraph-hbase/pom.xml)

  • Added hbase.version property: 2.6.5
  • Replaced:
    • com.baidu.hugegraph:hbase-shaded-endpoint:2.0.6
  • With:
    • org.apache.hbase:hbase-endpoint:${hbase.version}
    • org.apache.hbase:hbase-shaded-client:${hbase.version}
  • Added exclusions on hbase-endpoint to avoid pulling heavyweight server/hadoop transitive components not needed by HugeGraph runtime.
  • Kept dependency order (hbase-endpoint before hbase-shaded-client) to preserve AggregationClient/LongColumnInterpreter compatibility.

2) Dockerized HBase standalone environment (new files under docker/hbase/)

  • Added Dockerfile to build HBase 2.6.5 image from official Apache tarballs.
  • Added SHA512 verification with strict default behavior and mirror fallback:
    • primary: downloads.apache.org
    • fallback: archive.apache.org
  • Added robust checksum parsing to support Apache .sha512 formats.
  • Added entrypoint.sh that starts ZooKeeper + Master + RegionServer and blocks until service readiness.
  • Added hbase-site.xml tuned for local standalone/pseudo-distributed usage and HugeGraph defaults.
  • Added docker-compose.hbase.yml with ports, healthcheck, persistent volumes, and overridable download URLs.

3) End-to-end usage and troubleshooting docs (docker/HBASE.md)

  • Added full guide for:
    • starting/stopping HBase in Docker
    • HugeGraph server config/init for HBase backend
    • API sanity checks (schema + vertex + gremlin)
    • troubleshooting common issues and cleanup

4) Dependency allowlist update (install-dist/scripts/dependency/known-dependencies.txt)

  • Removed: hbase-shaded-endpoint-2.0.6.jar
  • Added:
    • hbase-endpoint-2.6.5.jar
    • hbase-shaded-client-2.6.5.jar

Why

  • Align HugeGraph HBase integration with official Apache HBase artifacts.
  • Remove reliance on custom shaded endpoint packaging.
  • Provide a consistent and secure local HBase test setup for contributors and CI-like reproduction.
  • Reduce build/setup friction with documented and reproducible steps.

Impact

  • Scope is limited to HBase module dependencies, Docker test tooling, and dependency metadata/docs.
  • Existing non-HBase backends are not directly affected.

How to verify

docker compose -f docker/hbase/docker-compose.hbase.yml build --no-cache hbase
docker compose -f docker/hbase/docker-compose.hbase.yml up -d
docker compose -f docker/hbase/docker-compose.hbase.yml ps

mvn clean install -pl hugegraph-server/hugegraph-hbase -am -DskipTests

bash install-dist/scripts/dependency/check_dependencies.sh

Hbase upgrade varification ( Hbase Backend version 2.0.6 and client libary version 2.6.5)

  1. Create hbase containers using 2.0.6 version and create graph.
  2. Apply the patch and start hugegraph server ( but keep the hbase 2.0.6 container as it is)
  3. Execute the ggraph queries on data populated in 2.0.6 verify that its succefful ( Hbase client version updated but still able to retrieve data from hbase 2.0.6 version)

Fresh install verification ( Hbase Backebd and client version 2.6.5)

  1. Create Hbase 2.6.5 container.
  2. Apply patch and start hugegraph server.
  3. Creat sample graph and execute some queries

Notes

SHA512 verification remains enforced by default during Docker image build.
ALLOW_UNVERIFIED_DOWNLOAD=true is intended only for trusted/restricted test environments.

…int with official artifacts apache#3016

-Added hbase-shaded-client and hbase-endpoint dependencies instead of custom hbase-shaded-endpoint library.
-Added docker files and HBASE.md containing instructions for HBase backend
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. dependencies Incompatible dependencies of package labels May 11, 2026
@vaijosh vaijosh changed the title [Improve] Upgrade HBase version and replace custom hbase-shaded-endpint with official artifacts #3016 [Improve] Upgrade HBase version and replace custom hbase-shaded-endpint with official artifacts May 12, 2026
@vaijosh vaijosh changed the title [Improve] Upgrade HBase version and replace custom hbase-shaded-endpint with official artifacts [Improve] Upgrade HBase version and replace custom hbase-shaded-endpint with official artifacts #3016 May 12, 2026
@vaijosh vaijosh changed the title [Improve] Upgrade HBase version and replace custom hbase-shaded-endpint with official artifacts #3016 3016: [Improve] Upgrade HBase version and replace custom hbase-shaded-endpint with official artifacts #3016 May 12, 2026
@imbajin imbajin requested a review from Copilot May 15, 2026 07:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes the HBase backend by replacing the long-pinned com.baidu.hugegraph:hbase-shaded-endpoint:2.0.6 with the official Apache hbase-endpoint + hbase-shaded-client 2.6.5 artifacts, and ships a Docker-based local HBase test environment plus an end-to-end usage guide so contributors can reproduce HBase-backend validation.

Changes:

  • Upgrade HBase client to 2.6.5 (official Apache artifacts) with transitive exclusions and a dependency-allowlist update.
  • Add a self-contained Docker setup (Dockerfile, entrypoint.sh, hbase-site.xml, docker-compose.hbase.yml) for a standalone HBase 2.6.5 cluster.
  • Add docker/HBASE.md documenting build, run, API sanity checks, and troubleshooting.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 14 comments.

Show a summary per file
File Description
hugegraph-server/hugegraph-hbase/pom.xml Switch HBase deps to official 2.6.5 with transitive exclusions and ordering comment.
install-dist/scripts/dependency/known-dependencies.txt Replace old shaded-endpoint jar with new endpoint/shaded-client jars.
docker/hbase/Dockerfile Build standalone HBase 2.6.5 image with SHA512 verification + mirror fallback.
docker/hbase/entrypoint.sh Start ZK/master/regionserver and wait for readiness, then tail logs.
docker/hbase/hbase-site.xml Standalone/pseudo-distributed HBase config tuned for HugeGraph defaults.
docker/hbase/docker-compose.hbase.yml Compose service with ports, volumes, healthcheck, build args.
docker/HBASE.md End-to-end Docker/HBase backend setup and troubleshooting guide.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docker/HBASE.md Outdated
Comment thread install-dist/scripts/dependency/known-dependencies.txt
Comment thread hugegraph-server/hugegraph-hbase/pom.xml
Comment thread hugegraph-server/hugegraph-hbase/pom.xml
Comment thread docker/hbase/entrypoint.sh Outdated
Comment thread docker/hbase/Dockerfile Outdated
Comment thread docker/hbase/Dockerfile Outdated
Comment thread hugegraph-server/hugegraph-hbase/pom.xml
Comment thread docker/HBASE.md Outdated
Comment thread docker/HBASE.md
@imbajin imbajin changed the title 3016: [Improve] Upgrade HBase version and replace custom hbase-shaded-endpint with official artifacts #3016 refactor: upgrade HBase and replace custom hbase-shaded-endpint #3016 May 15, 2026
@imbajin imbajin changed the title refactor: upgrade HBase and replace custom hbase-shaded-endpint #3016 refactor: upgrade HBase and replace custom hbase-shaded-endpint May 15, 2026
vaijosh added 2 commits May 15, 2026 17:04
apache#3021
Addressed review comments in this update:
- docker/HBASE.md
  - fixed Quick Start step title to match the actual command (image build)
  - aligned manual API examples with the default local server endpoint base (/graphs)
  - clarified idempotency wording around check_exist behavior
- docker/hbase/entrypoint.sh
  - fixed log glob pattern to match runtime-generated hbase-* log files
  - replaced invalid exec+|| fallback with explicit log-file existence handling
- docker/hbase/hbase-site.xml
  - set hbase.rootdir to explicit file:///tmp/hbase for deterministic local-FS mode
- docker/hbase/Dockerfile
  - switched to stable archive URL as primary source
  - fetch checksum from the actually downloaded source first
  - hardened checksum parsing for grouped SHA512 formats
  - removed stale cleanup path
Replace custom hbase-shaded-endpoint with a streamlined hbase-endpoint.
This reduces the runtime footprint by excluding heavyweight transitive
dependencies not required by the HugeGraph HBase client.

Key exclusions and rationale:
- Server logic: hbase-server (coprocessors run on RS, not client).
- Batch/Async: hbase-mapreduce, hbase-asyncfs, and hbase-replication.
- Hadoop stack: hadoop-client/auth/common/hdfs. HugeGraph uses the
  ZooKeeper registry directly and avoids the YARN/MapReduce stack.
- Legacy logging: log4j 1.x, slf4j-log4j12, and redundant slf4j-api
  versions were purged to eliminate vulnerabilities and conflicts.
- Native/Compression: snappy-java (handled server-side).

Updated known-dependencies.txt to reflect the minimal allowlist.
Improved pom.xml comments to document exclusion rationales and
addressed automated review feedback regarding dependency management.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.25%. Comparing base (66e5339) to head (bc64ce7).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@              Coverage Diff              @@
##             master    #3021       +/-   ##
=============================================
+ Coverage     35.90%   93.25%   +57.35%     
+ Complexity      338       65      -273     
=============================================
  Files           803        9      -794     
  Lines         68040      267    -67773     
  Branches       8905       22     -8883     
=============================================
- Hits          24429      249    -24180     
+ Misses        40991        8    -40983     
+ Partials       2620       10     -2610     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@imbajin
Copy link
Copy Markdown
Member

imbajin commented May 17, 2026

The HBase upgrade direction looks reasonable, but I don't think this is ready to merge until the dependency/release-materials check and HBase runtime verification are completed.

This PR replaces the old custom hbase-shaded-endpoint:2.0.6 dependency with official Apache HBase 2.6.5 artifacts and adds several new/changed transitive jars in known-dependencies.txt:

hugegraph-hbase
        -> hbase-endpoint:2.6.5
        -> hbase-shaded-client:2.6.5
        -> hbase-client/common/zookeeper/protocol/... 2.6.5
        -> zookeeper 3.8.6
        -> other transitive jars

For dependency changes, please also check the release-compliance materials and update them as needed, following the existing HugeGraph release-docs style:

install-dist/scripts/dependency/known-dependencies.txt
install-dist/release-docs/LICENSE
install-dist/release-docs/NOTICE
install-dist/release-docs/licenses/*

This does not necessarily mean adding large new LICENSE / NOTICE sections. The exact update should be based on the actual newly added or changed jars and their license/notice requirements, consistent with how this repo already records third-party dependencies. But we should not merge with only the dependency allowlist updated if the corresponding release materials still describe the old HBase 2.0.6 dependency set.

Please also confirm the new HBase 2.6.5 runtime path with enough verification before merge. Since this PR adds a Dockerized HBase 2.6.5 environment, it would be helpful to include the exact commands/results used to verify that the new version works end to end, for example:

docker compose -f docker/hbase/docker-compose.hbase.yml build --no-cache hbase
docker compose -f docker/hbase/docker-compose.hbase.yml up -d
mvn clean install -pl hugegraph-server/hugegraph-hbase -am -DskipTests
init HugeGraph with backend=hbase
start HugeGraph server
run schema + vertex + Gremlin sanity checks
run count/aggregation path that depends on AggregationClient / LongColumnInterpreter

In short:

Before merge:
  1. audit the new/changed HBase dependency set
  2. update release docs/licenses only where needed
  3. confirm the HBase 2.6.5 Docker/runtime verification is actually green

vaijosh added 2 commits May 17, 2026 22:17
Implemented review comments
-Updated the new dependencies in LICENSE, NOTICE files
-Added licenses corresponding to new libraries in install-dist/release-docs/licenses
-Updating the install-hbase.sh and hbase-site.xml files to fix the failures. Varified it locally using CI steps.
@vaijosh
Copy link
Copy Markdown
Author

vaijosh commented May 18, 2026

confirm the HBase 2.6.5 Docker/runtime verification is actually green
Many thanks @imbajin for the review and valuable suggestions!

I have updated the PR based on your feedback:

  1. License & Notice Updates: Updated the files under release-docs/ (specifically licenses/, NOTICE, and LICENSE).
  2. HBase 2.6.5 Verification: Confirmed that the Docker/runtime verification is green and fully functional. I verified this via two methods:
    • Manual Check: Followed the steps outlined in HBASE.md.
    • CI Pipeline Check: Verified locally using the exact steps executed by the CI pipeline.

I have attached the successful local API test execution logs for the reference.
api-test-execution.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Incompatible dependencies of package size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improve] Upgrade HBase version and replace custom hbase-shaded-endpoint with official artifacts

3 participants