-
Notifications
You must be signed in to change notification settings - Fork 815
Fail viable/strict update when no green commit found #16650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16650
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated FailureAs of commit 427c297 with merge base e725fe4 ( NEW FAILURE - The following job has failed:
CANCELLED JOB - The following job was cancelled. Please retry:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the viable/strict workflow to fail when no green commit is found instead of silently succeeding. This improves visibility by showing a RED status in the dashboard and providing detailed failure information.
Changes:
- Add step ID to capture outputs from the update-viablestrict action
- Add conditional step to generate CI failure summary when no green commit found
- Exit with error code to mark workflow as failed
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| --jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \ | ||
| while IFS='|' read -r run_id sha title workflow; do |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipe character used as delimiter in the jq output could conflict with pipe characters that might exist in workflow names (even after gsub on display_title). Consider using a more unique delimiter like tab (\t) or null character to avoid potential parsing issues.
| --jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \ | |
| while IFS='|' read -r run_id sha title workflow; do | |
| --jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)\t\(.head_sha | .[0:7])\t\(.display_title | gsub("\\|"; "-") | .[0:50])\t\(.name)"' 2>/dev/null | \ | |
| while IFS=$'\t' read -r run_id sha title workflow; do |
| failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs" \ | ||
| --jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making an API call for each failed workflow run inside a loop could result in many sequential API requests, potentially hitting rate limits or causing slow execution. Consider batching the data collection or limiting the number of runs processed in the first query (e.g., reduce per_page from 50 to 10-20 for the outer query).
| gh api repos/pytorch/executorch/actions/runs?per_page=50 \ | ||
| --jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \ |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query fetches the 50 most recent workflow runs regardless of whether they're relevant to the viable/strict update decision. Consider filtering by branch (e.g., main/master) or adding a time constraint to ensure the displayed failures are actually relevant to the viable/strict update process.
| failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs" \ | ||
| --jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1) | ||
| if [ -n "$failed_job" ]; then | ||
| echo "| ${sha} | ${title} | ${workflow} / ${failed_job} |" >> $GITHUB_STEP_SUMMARY |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a workflow run has multiple failed jobs, only the first one is shown due to head -1 on line 51. This could be misleading when debugging. Consider either documenting this limitation in a comment or showing a count of total failed jobs for each run.
| failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs" \ | |
| --jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1) | |
| if [ -n "$failed_job" ]; then | |
| echo "| ${sha} | ${title} | ${workflow} / ${failed_job} |" >> $GITHUB_STEP_SUMMARY | |
| failed_jobs=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs" \ | |
| --jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null) | |
| if [ -n "$failed_jobs" ]; then | |
| first_failed_job=$(printf '%s\n' "$failed_jobs" | head -1) | |
| failed_count=$(printf '%s\n' "$failed_jobs" | wc -l | tr -d ' ') | |
| if [ "$failed_count" -gt 1 ]; then | |
| suffix=" (+$((failed_count - 1)) more)" | |
| else | |
| suffix="" | |
| fi | |
| echo "| ${sha} | ${title} | ${workflow} / ${first_failed_job}${suffix} |" >> $GITHUB_STEP_SUMMARY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Fallback: just show recent failed runs globally | ||
| gh api repos/pytorch/executorch/actions/runs?per_page=20 \ | ||
| --jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \ | ||
| while IFS='|' read -r run_id sha title workflow; do | ||
| failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs?per_page=100" \ | ||
| --jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1) | ||
| if [ -n "$failed_job" ]; then | ||
| echo "| ${sha} | ${title} | ${workflow} / ${failed_job} |" >> $GITHUB_STEP_SUMMARY | ||
| fi | ||
| done |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback section queries recent failed runs without filtering by branch or commit range. This means it will show failures from all branches, which may not be relevant to understanding why viable/strict couldn't be updated. Consider adding a branch filter (e.g., filtering for the main branch) or adding a note in the output that these are global failures and may not be related to the viable/strict update issue.
| echo "| Commit | Title | Failed Job |" >> $GITHUB_STEP_SUMMARY | ||
| echo "|--------|-------|------------|" >> $GITHUB_STEP_SUMMARY |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The table header is always printed (lines 42-43) but if no failed runs are found for any commits, the table will be empty. This creates a table with only headers and no rows, which looks incomplete. Consider adding a fallback message or check to indicate when no failures were found to display.
| failed_run=$(gh api "repos/pytorch/executorch/actions/runs?head_sha=${sha}&per_page=100" \ | ||
| --jq '.workflow_runs[] | select(.conclusion == "failure" or .conclusion == "cancelled") | "\(.id)|\(.name)"' 2>/dev/null | head -1) | ||
|
|
||
| if [ -n "$failed_run" ]; then | ||
| run_id=$(echo "$failed_run" | cut -d'|' -f1) | ||
| workflow_name=$(echo "$failed_run" | cut -d'|' -f2) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pipe delimiter used to separate fields can break parsing if workflow names or job names contain pipe characters. While this is unlikely in practice, it could cause field misalignment in the output table. For more robust parsing, consider using a less common delimiter (such as a tab character or ASCII unit separator) or sanitizing the workflow and job names in the jq filter using gsub, similar to how commit titles are sanitized on line 52.
| else | ||
| # Fallback: just show recent failed runs globally | ||
| gh api repos/pytorch/executorch/actions/runs?per_page=20 \ | ||
| --jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \ |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the issue in the main branch above, the pipe delimiter can break parsing if any of the fields (workflow name, display_title, or job name) contain pipe characters. Consider using a more robust delimiter or sanitizing all fields in the jq filter, not just display_title.
| --jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \ | |
| --jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name | gsub("\\|"; "-"))"' 2>/dev/null | \ |
| gh api repos/pytorch/executorch/actions/runs?per_page=100 \ | ||
| --jq '.workflow_runs[] | select(.conclusion == "failure") | .name' 2>/dev/null \ | ||
| | sort | uniq -c | sort -rn | head -10 >> $GITHUB_STEP_SUMMARY || echo "Failed to fetch data" >> $GITHUB_STEP_SUMMARY |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the fallback section above, this query fetches all failed workflow runs without filtering by branch or time range. This could include failures from feature branches or old failures that are not relevant to the viable/strict update. Consider filtering by branch (e.g., event=push, branch=main) or adding a created filter to only show recent failures that would actually block the viable/strict update.
aa73858 to
056c049
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| # Filter by branch=main and use unit separator | ||
| gh api "repos/pytorch/executorch/actions/runs?branch=main&per_page=20" \ | ||
| --jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)\u001f\(.head_sha | .[0:7])\u001f\(.display_title | .[0:50])\u001f\(.name)"' 2>/dev/null | \ |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jq string slicing syntax .[0:7] and .[0:50] is incorrect for strings in jq. Use [:7] and [:50] instead. The current syntax will cause jq to fail when processing these fields.
| # Get commits between viable/strict and main (up to 20) | ||
| # Use ASCII unit separator (0x1f) to avoid issues with special chars in names | ||
| COMMITS=$(gh api "repos/pytorch/executorch/compare/${VIABLE_SHA}...${MAIN_SHA}" \ | ||
| --jq '.commits | reverse | .[0:20] | .[] | "\(.sha)\u001f\(.commit.message | split("\n")[0] | .[0:50])"' 2>/dev/null || echo "") |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jq string slicing syntax .[0:50] is incorrect for strings in jq. Use [:50] instead. The current syntax will cause jq to fail when processing commit messages.
| --jq '.commits | reverse | .[0:20] | .[] | "\(.sha)\u001f\(.commit.message | split("\n")[0] | .[0:50])"' 2>/dev/null || echo "") | |
| --jq '.commits | reverse | .[0:20] | .[] | "\(.sha)\u001f\(.commit.message | split("\n")[0] | [:50])"' 2>/dev/null || echo "") |
| failed_run=$(gh api "repos/pytorch/executorch/actions/runs?head_sha=${sha}&per_page=100" \ | ||
| --jq '.workflow_runs[] | select(.conclusion == "failure" or .conclusion == "cancelled") | "\(.id)\u001f\(.name)"' 2>/dev/null | head -1) |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query selects both 'failure' and 'cancelled' runs, but the section header and error message only mention 'failures'. This could be confusing when cancelled runs appear in the output. Consider either filtering only failures or updating the messaging to clarify that cancelled runs are included.
| echo "### Summary by job (main branch, last 24h)" >> $GITHUB_STEP_SUMMARY | ||
| echo '```' >> $GITHUB_STEP_SUMMARY | ||
| # Filter by main branch and created in last 24 hours | ||
| yesterday=$(date -u -d '24 hours ago' '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -v-24H '+%Y-%m-%dT%H:%M:%SZ') |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fallback date command uses macOS-specific syntax (-v-24H) which won't work on the ubuntu-22.04 runner specified in line 15. Since the job runs on Ubuntu, the GNU date syntax will always succeed and the macOS fallback will never execute. Consider removing the unnecessary fallback.
| yesterday=$(date -u -d '24 hours ago' '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -v-24H '+%Y-%m-%dT%H:%M:%SZ') | |
| yesterday=$(date -u -d '24 hours ago' '+%Y-%m-%dT%H:%M:%SZ') |
Add CI failure summary and exit with error when no green commit is found, making the workflow show as RED in the dashboard instead of always succeeding silently. The summary includes: - Per-commit table showing commit SHA, title, and failed job - Aggregate count of failures by workflow name This makes it easier to identify which CI jobs are blocking viable/strict updates.
Handle cancelled workflow runs and jobs in addition to failures. Also add per_page=100 to jobs API call for workflows with many jobs.
Instead of showing random recent failures globally, now query commits between viable/strict and main branches and show their specific failures. This gives a clearer picture of what's blocking viable/strict from advancing.
- Use tab delimiter instead of pipe to avoid parsing issues with names containing pipe characters - Sanitize all fields (workflow names, job names, titles) with gsub - Filter fallback query by branch=main instead of showing all branches - Filter summary by job to main branch and last 24 hours - Add note in fallback explaining these are main branch failures - Only print table headers when there are failures to show - Add empty state message when no failures found
Replace tab delimiter with ASCII unit separator (0x1f) which is guaranteed not to appear in workflow names, job names, or commit titles. Simplify sanitization to just use tr for pipe characters.
- Use correct jq string slicing syntax (.[:N] instead of .[0:N]) - Remove cancelled filter since we only mention failures in messaging - Remove unnecessary macOS date fallback (runs on Ubuntu)
8786994 to
427c297
Compare
|
Note to self: land after the branch cut |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| clickhouse-password: ${{ secrets.CLICKHOUSE_VIABLESTRICT_PASSWORD }} | ||
|
|
||
| - name: Summarize CI failures | ||
| if: steps.update.outputs.latest_viable_sha == 'None' |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string comparison == 'None' suggests the output is the literal string 'None' rather than an empty string or null. This is an unusual pattern for GitHub Actions outputs. Consider verifying this matches the actual output format from the upstream action, or use a more conventional empty check like == '' if appropriate.
| if: steps.update.outputs.latest_viable_sha == 'None' | |
| if: steps.update.outputs.latest_viable_sha == '' |
| echo "| Commit | Title | Failed Job |" >> $GITHUB_STEP_SUMMARY | ||
| echo "|--------|-------|------------|" >> $GITHUB_STEP_SUMMARY | ||
| while IFS=$'\x1f' read -r sha title; do |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using $'\x1f' (hexadecimal escape) is inconsistent with the \u001f (Unicode escape) used in jq queries. While both represent the ASCII unit separator (0x1F), mixing notation styles reduces readability. Consider standardizing on one approach throughout the script.
| [ -z "$sha" ] && continue | ||
| short_sha="${sha:0:7}" | ||
| # Sanitize title for markdown table | ||
| title=$(echo "$title" | tr '|' '-') |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanitization using tr '|' '-' only replaces pipe characters. Other markdown special characters (backticks, brackets, asterisks, underscores) in commit messages could break the table formatting or create unintended markdown rendering. Consider more comprehensive escaping or using a safer delimiter pattern.
| if [ -n "$failed_run" ]; then | ||
| IFS=$'\x1f' read -r run_id workflow_name <<< "$failed_run" | ||
| # Sanitize workflow name for markdown table | ||
| workflow_name=$(echo "$workflow_name" | tr '|' '-') |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanitization using tr '|' '-' only replaces pipe characters. Other markdown special characters (backticks, brackets, asterisks, underscores) in workflow names could break the table formatting or create unintended markdown rendering. Consider more comprehensive escaping or using a safer delimiter pattern.
| while IFS=$'\x1f' read -r run_id sha title workflow; do | ||
| # Sanitize for markdown table | ||
| title=$(echo "$title" | tr '|' '-') | ||
| workflow=$(echo "$workflow" | tr '|' '-') | ||
| failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs?per_page=100" \ | ||
| --jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1 | tr '|' '-') |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanitization using tr '|' '-' only replaces pipe characters. Other markdown special characters (backticks, brackets, asterisks, underscores) could break the table formatting or create unintended markdown rendering. Consider more comprehensive escaping or using a safer delimiter pattern.
| while IFS=$'\x1f' read -r run_id sha title workflow; do | |
| # Sanitize for markdown table | |
| title=$(echo "$title" | tr '|' '-') | |
| workflow=$(echo "$workflow" | tr '|' '-') | |
| failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs?per_page=100" \ | |
| --jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1 | tr '|' '-') | |
| sanitize_md() { | |
| local input="$1" | |
| # Replace table separators and escape other markdown-special characters | |
| input=${input//|/-} | |
| input=${input//\`/\'} | |
| input=${input//\[/\\[} | |
| input=${input//]/\\]} | |
| input=${input//*/\\*} | |
| input=${input//_/\\_} | |
| printf '%s' "$input" | |
| } | |
| while IFS=$'\x1f' read -r run_id sha title workflow; do | |
| # Sanitize for markdown table | |
| title=$(sanitize_md "$title") | |
| workflow=$(sanitize_md "$workflow") | |
| failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs?per_page=100" \ | |
| --jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1) | |
| failed_job=$(sanitize_md "$failed_job") |
| failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs?per_page=100" \ | ||
| --jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1 | tr '|' '-') |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API call is made inside a loop for each failed run, potentially causing many sequential API requests. Consider batching or caching job information, or at minimum adding rate limit awareness to prevent hitting GitHub API limits when many failures exist.
Add CI failure summary and exit with error when no green commit is
found, making the workflow show as RED in the dashboard instead of
always succeeding silently.
The summary includes:
This makes it easier to identify which CI jobs are blocking
viable/strict updates.
Tested locally using this script:
https://gist.github.com/mergennachin/a3f310d5b718a5279d6af41d836a4dd9
output:
https://gist.github.com/mergennachin/006dfc4e969d72c378825cb750738151