Skip to content

refactor: untrack generated python proto, regenerate in build#5077

Open
Ma77Ball wants to merge 14 commits into
apache:mainfrom
Ma77Ball:feat/changeProtobufGenerate
Open

refactor: untrack generated python proto, regenerate in build#5077
Ma77Ball wants to merge 14 commits into
apache:mainfrom
Ma77Ball:feat/changeProtobufGenerate

Conversation

@Ma77Ball
Copy link
Copy Markdown
Contributor

@Ma77Ball Ma77Ball commented May 15, 2026

What changes were proposed in this PR?

Stop version-tracking the betterproto-generated Python files under amber/src/main/python/proto/. Add a genPythonProto sbt task that runs bin/python-proto-gen.sh as a dependency of WorkflowExecutionService/Compile/compile, with a graceful skip when protoc is missing. Wire protoc 3.19.4 (matching PB.protocVersion) and betterproto[compiler]==2.0.0b7 into the three image build stages (worker,master, web-app) and the three CI jobs that touch Python proto (python, python-state-materialization-mac, amber-integration). The 14th tracked file (proto/org/apache/texera/web/__init__.py) had no corresponding .proto source and is gone after regen; grep confirms no callers.

Any related issues, documentation, or discussions?

Closes: #4102

How was this PR tested?

locally: sbt clean WorkflowExecutionService/compile regenerates the tree and pytest -m "not integration" passes; skip path verified with protoc absent. All three Docker images build end-to-end. CI covers the workflow YAML on
push.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.7 in compliance with ASF

@github-actions github-actions Bot added feature engine dependencies Pull requests that update a dependency file python ci changes related to CI docs Changes related to documentations dev labels May 15, 2026
@Ma77Ball
Copy link
Copy Markdown
Contributor Author

/request-review @Yicong-Huang

@github-actions github-actions Bot requested a review from Yicong-Huang May 15, 2026 11:18
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 43.83%. Comparing base (0f5f791) to head (32c268a).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5077      +/-   ##
============================================
+ Coverage     43.66%   43.83%   +0.17%     
  Complexity     2218     2218              
============================================
  Files          1049     1051       +2     
  Lines         40580    40705     +125     
  Branches       4324     4324              
============================================
+ Hits          17719    17843     +124     
  Misses        21766    21766              
- Partials       1095     1096       +1     
Flag Coverage Δ *Carryforward flag
access-control-service 39.53% <ø> (ø)
agent-service 33.76% <ø> (ø)
amber 43.94% <ø> (ø) Carriedforward from 0f5f791
computing-unit-managing-service 1.38% <ø> (ø)
config-service 19.35% <ø> (ø)
file-service 32.18% <ø> (ø)
frontend 35.15% <ø> (ø)
python 90.79% <ø> (+0.28%) ⬆️
workflow-compiling-service 58.39% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Yicong-Huang
Copy link
Copy Markdown
Contributor

Yicong-Huang commented May 15, 2026

Thanks, that's a good direction for change! Let inline comments.

Please make pr description concise. The "how you test" section is basically saying "ci" can verify. I don't get too much new info reading it.

Comment thread amber/build.sbt Outdated
Comment thread bin/computing-unit-master.dockerfile Outdated
@Yicong-Huang Yicong-Huang changed the title refactor: untrack generated python proto, regenerate in build [Hackathon] refactor: untrack generated python proto, regenerate in build May 15, 2026
@Ma77Ball Ma77Ball changed the title [Hackathon] refactor: untrack generated python proto, regenerate in build refactor: untrack generated python proto, regenerate in build May 15, 2026
@Ma77Ball Ma77Ball requested a review from Yicong-Huang May 22, 2026 08:11
@Ma77Ball
Copy link
Copy Markdown
Contributor Author

Ma77Ball commented May 22, 2026

@Yicong-Huang, I addressed the comments. Please review when available.

Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the effort to embed the script. however, letting python protobuf generation depends on sbt/scala/jdk is also not ideal. Can we have a better solution so that it only depends on python and a protoc lib? Sorry if this was confusing before.

Comment thread .github/workflows/build.yml Outdated
if [ -f amber/requirements.txt ]; then uv pip install --system --index-strategy unsafe-best-match -r amber/requirements.txt; fi
if [ -f amber/operator-requirements.txt ]; then uv pip install --system --index-strategy unsafe-best-match -r amber/operator-requirements.txt; fi
if [ -f amber/dev-requirements.txt ]; then uv pip install --system --index-strategy unsafe-best-match -r amber/dev-requirements.txt; fi
- name: Install protoc 3.19.4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it possible to refer to some dependency file to declare version? so that we don't need to update CI file when need to bump protoc version.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pinned in bin/protoc-version.txt.

Comment thread .github/workflows/build.yml Outdated
Comment on lines +640 to +648
- name: Set up JDK 17
uses: actions/setup-java@v5
with:
distribution: temurin
java-version: 17
- name: Setup sbt launcher
uses: sbt/setup-sbt@508b753e53cb6095967669e0911487d2b9bc9f41 # v1.1.22
- name: Generate Python proto bindings
run: sbt "WorkflowExecutionService/genPythonProto"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, can we not rely on sbt to generate python protobuf? it is unnecessary to install JDK just for this purpose.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python and python-state-materialization-mac jobs now call python3 bin/gen-python-proto.py directly

Comment thread bin/computing-unit-master.dockerfile Outdated
Comment on lines +51 to +54
# protoc 3.19.4 (matches PB.protocVersion in amber/build.sbt) and the
# betterproto plugin are required by the genPythonProto sbt task so the
# generated amber/src/main/python/proto/ tree is populated before the
# dist is packaged.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are hard coding this 3.19.4 version in too many places. let's define it at one place and all refer to it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single source of truth is bin/protoc-version.txt.

Comment thread bin/computing-unit-worker.dockerfile Outdated
&& unzip -o /tmp/protoc.zip -d /usr/local \
&& chmod +x /usr/local/bin/protoc \
&& rm /tmp/protoc.zip \
&& pip3 install --no-cache-dir 'betterproto[compiler]==2.0.0b7'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

betterproto[compiler]==2.0.0b7 this is also hard coded. can we refer to requirements.txt instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pin lives only in amber/dev-requirements.txt.

Comment thread AGENTS.md Outdated
Comment on lines +90 to +94
`amber/src/main/python/proto/` is gitignored and regenerated by the
`genPythonProto` sbt task in [`amber/build.sbt`](amber/build.sbt), which
runs as part of `sbt amber/compile` (or `sbt amber/genPythonProto`
directly). Requires `protoc` on PATH (pin to `3.19.4` to match
`PB.protocVersion`); skipped with a warning when `protoc` is missing.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's remove this from Agent.md. too many details.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI dependencies Pull requests that update a dependency file dev docs Changes related to documentations engine feature python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Untrack generated python protobuf files

3 participants