fix(ci): stop re-labeling unrelated PRs for old Field defaults#3329
Conversation
Co-authored-by: openhands <openhands@all-hands.dev>
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the API breakage checker as a workflow user would: it still reports PyPI-baseline Field(default=...) changes but no longer treats pre-existing base-branch defaults as PR-introduced label triggers.
Does this PR achieve its stated goal?
Yes. Against origin/main, the old checker emitted only field_default_changes with the existing LLM.model default diff, which is the input that previously caused unrelated PRs to be labeled. The PR checker on the same base emitted field_default_changes_count=1 but field_default_changes_since_base_count=0; after simulating a real PR-introduced default change, it emitted field_default_changes_since_base_count=1, confirming true new defaults are still detected.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed and installed the uv environment. |
| CI Status | 🟡 Observed 24 successful, 5 skipped, and 5 in-progress checks; Python API was ✅ SUCCESS and the PR had no release-note-required label. |
| Functional Verification | ✅ API breakage script behavior matched the PR goal for both false-positive and true-positive scenarios. |
Functional Verification
Test 1: Pre-existing PyPI-baseline default change should not trigger since-base label data
Step 1 — Reproduce / establish baseline without the fix:
Ran the main-branch checker against origin/main:
git checkout --detach origin/main
ACP_VERSION_CHECK_BASE_REF=origin/main SDK_API_BREAKAGE_REPORT_PATH=/tmp/qa-main-report.json uv run python .github/scripts/check_sdk_api_breakage.pyObserved summary:
baseline_exit=0
report_keys= ['field_default_changes']
field_default_changes_count= 1
field_default_changes_since_base_count= missing
change= openhands.sdk.llm.llm.LLM.model 'claude-sonnet-4-20250514' -> 'gpt-5.5'
This confirms the old workflow input had only the PyPI-baseline default change available, so unrelated PRs could not distinguish “already on base” from “introduced by this PR.”
Step 2 — Apply the PR's changes:
Checked out openhands/fix-release-note-label-attribution at 11e272d6f9aed5d15e3cabe35b22ebd25ad66c42.
Step 3 — Re-run with the fix in place:
Ran the same checker command against the same base:
ACP_VERSION_CHECK_BASE_REF=origin/main SDK_API_BREAKAGE_REPORT_PATH=/tmp/qa-pr-report.json uv run python .github/scripts/check_sdk_api_breakage.pyObserved summary:
pr_exit=0
report_keys= ['field_default_changes', 'field_default_changes_since_base']
field_default_changes_count= 1
field_default_changes_since_base_count= 0
pypi_change= openhands.sdk.llm.llm.LLM.model 'claude-sonnet-4-20250514' -> 'gpt-5.5'
since_base_changes= []
This shows the checker still reports compatibility information against the latest released baseline, but exposes an empty since-base list for label/comment sync on an unrelated PR.
Test 2: A true PR-introduced default change is still detected
Step 1 — Establish baseline:
The unchanged PR branch in Test 1 reported field_default_changes_since_base_count=0 against origin/main.
Step 2 — Apply a real local default change:
Temporarily changed the SDK LLM.model default from gpt-5.5 to qa-pr-model, then ran:
ACP_VERSION_CHECK_BASE_REF=origin/main SDK_API_BREAKAGE_REPORT_PATH=/tmp/qa-pr-introduced-report.json uv run python .github/scripts/check_sdk_api_breakage.pyObserved summary:
report_exists= True
keys= ['field_default_changes', 'field_default_changes_since_base']
field_default_changes_count= 1
field_default_changes_since_base_count= 1
since_base= [{'package': 'openhands.sdk', 'object_path': 'openhands.sdk.llm.llm.LLM.model', 'old_default': "'gpt-5.5'", 'new_default': "'qa-pr-model'"}]
This confirms the new since-base comparison still catches a default change introduced by the changeset, so label/comment automation has the right signal for real behavioral changes. The temporary file change was restored afterward and git status --short was clean.
Live PR observation
labels= []
has_release_note_required= False
python_api_checks= [('COMPLETED', 'SUCCESS')]
status_summary= {('IN_PROGRESS', ''): 5, ('COMPLETED', 'SKIPPED'): 5, ('COMPLETED', 'SUCCESS'): 24}
This matches the intended user-facing outcome: the API breakage check succeeded and the unrelated PR was not labeled release-note-required for the pre-existing LLM.model default change.
Issues Found
None.
This review was generated by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The API breakage checker now preserves PyPI-baseline reporting while preventing old base-branch Field default changes from driving release-note label/comment attribution.
Does this PR achieve its stated goal?
Yes. I ran the actual .github/scripts/check_sdk_api_breakage.py workflow script against the old main behavior and the PR behavior with ACP_VERSION_CHECK_BASE_REF=main: main produced only field_default_changes, which is the false-positive signal the old workflow used, while the PR produced the same PyPI-baseline diff plus field_default_changes_since_base: []. I also verified a real PR-introduced default change still populates field_default_changes_since_base, and the live PR currently has no release-note-required label while the workflow comment explains the default change was already on the base branch.
| Phase | Result |
|---|---|
| Environment Setup | ✅ make build completed; dependencies installed without running tests/linters |
| CI Status | ✅ 31 passing, 2 pending OpenHands automation checks, 4 skipped, 0 failing observed |
| Functional Verification | ✅ Before/after CLI execution confirms false-positive attribution is removed and true PR-introduced changes still surface |
Functional Verification
Test 1: Existing base-branch default change no longer drives label attribution
Step 1 — Reproduce / establish baseline without the fix:
Ran the actual pre-fix script from origin/main in an isolated worktree:
git worktree add --force /tmp/qa-pr3329-base origin/main
cd /tmp/qa-pr3329-base
ACP_VERSION_CHECK_BASE_REF=main SDK_API_BREAKAGE_REPORT_PATH=/tmp/qa-pr3329-base-report.json /home/runner/work/software-agent-sdk/software-agent-sdk/pr-repo/.venv/bin/python .github/scripts/check_sdk_api_breakage.pyReport excerpt:
{
"field_default_changes": [
{
"package": "openhands.sdk",
"object_path": "openhands.sdk.llm.llm.LLM.model",
"old_default": "'claude-sonnet-4-20250514'",
"new_default": "'gpt-5.5'"
}
]
}Exit code was 0. This confirms the old behavior exposed only the PyPI-baseline default diff, which the old workflow used as the label/comment signal even when that diff was already on main.
Step 2 — Apply the PR's changes:
Checked out the PR branch at 11e272d6f9aed5d15e3cabe35b22ebd25ad66c42 in the repository workspace.
Step 3 — Re-run with the fix in place:
Ran:
cd /home/runner/work/software-agent-sdk/software-agent-sdk/pr-repo
ACP_VERSION_CHECK_BASE_REF=main SDK_API_BREAKAGE_REPORT_PATH=/tmp/qa-pr3329-pr-report.json uv run python .github/scripts/check_sdk_api_breakage.pyReport excerpt:
{
"field_default_changes": [
{
"package": "openhands.sdk",
"object_path": "openhands.sdk.llm.llm.LLM.model",
"old_default": "'claude-sonnet-4-20250514'",
"new_default": "'gpt-5.5'"
}
],
"field_default_changes_since_base": []
}Exit code was 0. This shows the PR still reports the PyPI-baseline compatibility information, but the new label-driving view is empty for an unrelated branch on top of main, which is the stated bug fix.
Test 2: Legitimate PR-introduced default changes still surface
Step 1 — Establish baseline:
The PR run above shows field_default_changes_since_base: [] when no SDK default changed relative to the PR base.
Step 2 — Apply a realistic local PR change:
In a temporary worktree at the PR commit, I changed the public LLM.model default from gpt-5.5 to qa-model-verify, then used the PR commit SHA as ACP_VERSION_CHECK_BASE_REF.
Step 3 — Run the same script with the introduced change:
Ran:
ACP_VERSION_CHECK_BASE_REF=11e272d6f9aed5d15e3cabe35b22ebd25ad66c42 SDK_API_BREAKAGE_REPORT_PATH=/tmp/qa-pr3329-introduced-report.json /home/runner/work/software-agent-sdk/software-agent-sdk/pr-repo/.venv/bin/python .github/scripts/check_sdk_api_breakage.pyReport excerpt:
{
"field_default_changes": [
{
"object_path": "openhands.sdk.llm.llm.LLM.model",
"old_default": "'claude-sonnet-4-20250514'",
"new_default": "'qa-model-verify'"
}
],
"field_default_changes_since_base": [
{
"object_path": "openhands.sdk.llm.llm.LLM.model",
"old_default": "'gpt-5.5'",
"new_default": "'qa-model-verify'"
}
]
}This confirms the fix does not hide real PR-introduced default changes; it separates the release-baseline report from the base-ref attribution signal.
Live PR state
Ran:
gh pr view 3329 --repo OpenHands/software-agent-sdk --json labels,commentsObserved: labels were [], and the github-actions API-breakage comment says the LLM.model default change differs from the latest released baseline but was already present on the base branch, so this PR was not auto-marked with release-note-required.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
Verdict: PASS
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable, but I found a couple of important edge cases in the new base-ref comparison that should be addressed before relying on this for label sync.
Risk Assessment: 🟡 MEDIUM — CI/reporting-only change, but incorrect failure handling could suppress required release-note labels.
Review generated by an AI agent (OpenHands) on behalf of the user.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26170991825
|
[Automatic Post]: It has been a while since there was any activity on this PR. @enyst, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. This comment was created by an AI agent (OpenHands) on behalf of the user. |
|
@OpenHands this pr has a conflict, merge main into it and pay attention to preserve both if you can. Then /iterate since we have already an AI reviewer agent comments. Post directly on github here. |
|
I'm on it! enyst can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
|
Merged main into this branch and pushed a0e1eb6. Preserved both sides of the conflict: the PR's base-ref Field default comparison logic and main's
Validation run locally:
Continuing with /iterate on the updated SHA. This comment was created by an AI agent (OpenHands) on behalf of the user. |
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
The API-breakage workflow behavior was exercised with real script execution, and the PR prevents stale Field(default=...) changes from driving labels while preserving detection of PR-introduced default changes.
Does this PR achieve its stated goal?
Yes. The goal was to keep the PyPI/released-baseline compatibility view while using a PR-base comparison for release-note-required label/comment decisions. Manual execution confirmed the old behavior would label an unrelated PR when the only default change was already on main, while the PR behavior writes field_default_changes_since_base: [] for that case and still reports a non-empty base-ref list when the PR itself changes a default.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv run created/used the project environment and imported griffe successfully. |
| CI Status | 🟡 Observed 31 passing, 4 skipped, 2 pending (pr-review, this qa-changes run); no PR labels present. |
| Functional Verification | ✅ Real script execution plus before/after simulation verified false-positive prevention and true-positive retention. |
Functional Verification
Test 1: Run the API-breakage script on this PR branch
Step 1 — Establish current PR behavior:
Ran ACP_VERSION_CHECK_BASE_REF=main SDK_API_BREAKAGE_REPORT_PATH=/tmp/qa-pr-report.json uv run python .github/scripts/check_sdk_api_breakage.py on commit a0e1eb60fed37fbbb82c5d27f6f0f375d5cbc922.
RC=0
Checking openhands-sdk (openhands.sdk)
Comparing openhands-sdk 1.23.1 against 1.23.1
No breaking changes detected
...
---REPORT---
{
"field_default_changes": [],
"field_default_changes_since_base": []
}
This shows the script runs successfully as a real CI user would invoke it, emits the new base-ref report field, and provides no label-triggering default changes for this PR.
Test 2: Reproduce the stale-default false positive and verify the fix
Step 1 — Reproduce baseline without the fix:
Created a temporary git repo with a released baseline default (claude-sonnet-4-20250514), a main commit that already changed it to gpt-5.5, and an unrelated feature branch. Then ran the pre-PR origin/main API-breakage script against the released baseline vs the unrelated branch.
--- before PR / unrelated feature report ---
{
"field_default_changes": [
{
"package": "openhands.sdk",
"object_path": "openhands.sdk.Config.model",
"old_default": "'claude-sonnet-4-20250514'",
"new_default": "'gpt-5.5'"
}
]
}
old workflow label decision: True
This confirms the old behavior exposed only the released-baseline difference, so an unrelated PR would be treated as needing release-note-required.
Step 2 — Apply the PR's changes:
Using the same temporary repo and branch layout, ran the PR version of .github/scripts/check_sdk_api_breakage.py with main as the base-ref comparison.
Step 3 — Re-run with the fix in place:
--- after PR / unrelated feature report ---
{
"field_default_changes": [
{
"package": "openhands.sdk",
"object_path": "openhands.sdk.Config.model",
"old_default": "'claude-sonnet-4-20250514'",
"new_default": "'gpt-5.5'"
}
],
"field_default_changes_since_base": []
}
new workflow label decision: False
This verifies the PR keeps the compatibility/reporting signal but removes the stale change from the label/comment decision input.
Test 3: Verify true PR-introduced default changes still trigger
From the same simulated main, created a feature branch that changed the default from gpt-5.5 to gpt-6.0, then ran the PR script's base-ref comparison.
--- after PR / PR-introduced default-change report ---
{
"field_default_changes": [
{
"package": "openhands.sdk",
"object_path": "openhands.sdk.Config.model",
"old_default": "'gpt-5.5'",
"new_default": "'gpt-6.0'"
}
],
"field_default_changes_since_base": [
{
"package": "openhands.sdk",
"object_path": "openhands.sdk.Config.model",
"old_default": "'gpt-5.5'",
"new_default": "'gpt-6.0'"
}
]
}
new workflow label decision: True
This confirms the fix does not over-correct: genuine PR-introduced default changes remain visible to the workflow decision.
Issues Found
None.
This review was created by an AI agent (OpenHands) on behalf of the user.
all-hands-bot
left a comment
There was a problem hiding this comment.
The fix is correct and the two concerns raised in the previous review are both resolved.
What changed and why it works:
_collect_field_default_changes_since_refcompares the current HEAD against the PR base ref usingfield_defaults_only=Trueandemit_diagnostics=False, so the second pass is silent and only collectsFieldDefaultChangedata — no spurious::erroror::warningannotations from structural changes.- On failure (archive failure, griffe load error, or compute error), the function returns
Nonerather than an empty list.Nonepropagates as a sentinel intomain(), which omitsfield_default_changes_since_basefrom the JSON report entirely. The workflow then falls back toreport.field_default_changes(the PyPI-baseline comparison) for label sync, so therelease-note-requiredlabel is never incorrectly removed when the base-ref comparison cannot run. - When the base-ref comparison succeeds with an empty list
[], it correctly signals "no PR-introduced changes", preventing the label from being added (or removing it if set by a prior run).
Test coverage is thorough: the true-positive case (PR introduces the default change), the false-positive case (default already changed on main before an unrelated branch was cut), the failure path (None on a missing ref), and the quiet-mode guarantee (no diagnostic output for structural-only changes in the base-ref pass).
Minor note (no action needed): _git_ref_candidates uses dict.fromkeys to deduplicate two strings that are virtually always distinct. A plain (f'origin/{ref}', ref) tuple would be equally correct and slightly more direct — but this is a trivial style point, not worth a follow-up.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Taste Rating: Good taste. LGTM — this keeps the PyPI-baseline compatibility report while using the base-ref comparison for label/comment attribution, with tests covering both PR-introduced changes and the unrelated-follow-up false positive.
Risk Assessment: 🟡 MEDIUM — CI/reporting-only change, but it directly affects release-note label automation; the scoped fallback behavior and cross-tests make it acceptable.
Review generated by an AI agent (OpenHands) on behalf of the user.
Was this automated review useful? React with 👍 or 👎 to this review to help us measure review quality.
Workflow run: https://github.com/OpenHands/software-agent-sdk/actions/runs/26516478436
|
Since my last summary, there were no additional code changes or GitHub actions. Final status remains:
No extraneous changes remain to revert. |
Summary
Field(default=...)changes against the PR base refrelease-note-requiredlabel sync and PR comment text, so follow-up PRs are not re-labeled for already-merged unreleased defaultsmainbefore an unrelated branch was cutImportant
Testing
uv run pytest tests/cross/test_check_sdk_api_breakage.pyACP_VERSION_CHECK_BASE_REF=main SDK_API_BREAKAGE_REPORT_PATH=/tmp/sdk-api-breakage-report.json uv run python .github/scripts/check_sdk_api_breakage.pyLLM.modeldefault change but reportsfield_default_changes_since_base: []on top ofmainuv run pre-commit run --files AGENTS.md .github/scripts/check_sdk_api_breakage.py .github/workflows/api-breakage.yml tests/cross/test_check_sdk_api_breakage.pyThis PR was created by an AI agent (OpenHands) on behalf of the user.
@enyst can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:a0e1eb6-pythonRun
All tags pushed for this build
About Multi-Architecture Support
a0e1eb6-python) is a multi-arch manifest supporting both amd64 and arm64a0e1eb6-python-amd64) are also available if needed