Skip to content

Conversation

@mergennachin
Copy link
Contributor

@mergennachin mergennachin commented Jan 16, 2026

Add CI failure summary and exit with error when no green commit is
found, making the workflow show as RED in the dashboard instead of
always succeeding silently.

The summary includes:

  • Per-commit table showing commit SHA, title, and failed job
  • Aggregate count of failures by workflow name

This makes it easier to identify which CI jobs are blocking
viable/strict updates.

Tested locally using this script:

https://gist.github.com/mergennachin/a3f310d5b718a5279d6af41d836a4dd9

output:

https://gist.github.com/mergennachin/006dfc4e969d72c378825cb750738151

Copilot AI review requested due to automatic review settings January 16, 2026 15:31
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 16, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/16650

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Cancelled Job, 1 Unrelated Failure

As of commit 427c297 with merge base e725fe4 (image):

NEW FAILURE - The following job has failed:

CANCELLED JOB - The following job was cancelled. Please retry:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the viable/strict workflow to fail when no green commit is found instead of silently succeeding. This improves visibility by showing a RED status in the dashboard and providing detailed failure information.

Changes:

  • Add step ID to capture outputs from the update-viablestrict action
  • Add conditional step to generate CI failure summary when no green commit found
  • Exit with error code to mark workflow as failed

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 47 to 48
--jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \
while IFS='|' read -r run_id sha title workflow; do
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipe character used as delimiter in the jq output could conflict with pipe characters that might exist in workflow names (even after gsub on display_title). Consider using a more unique delimiter like tab (\t) or null character to avoid potential parsing issues.

Suggested change
--jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \
while IFS='|' read -r run_id sha title workflow; do
--jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)\t\(.head_sha | .[0:7])\t\(.display_title | gsub("\\|"; "-") | .[0:50])\t\(.name)"' 2>/dev/null | \
while IFS=$'\t' read -r run_id sha title workflow; do

Copilot uses AI. Check for mistakes.
Comment on lines 50 to 51
failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs" \
--jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1)
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making an API call for each failed workflow run inside a loop could result in many sequential API requests, potentially hitting rate limits or causing slow execution. Consider batching the data collection or limiting the number of runs processed in the first query (e.g., reduce per_page from 50 to 10-20 for the outer query).

Copilot uses AI. Check for mistakes.
Comment on lines 46 to 47
gh api repos/pytorch/executorch/actions/runs?per_page=50 \
--jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query fetches the 50 most recent workflow runs regardless of whether they're relevant to the viable/strict update decision. Consider filtering by branch (e.g., main/master) or adding a time constraint to ensure the displayed failures are actually relevant to the viable/strict update process.

Copilot uses AI. Check for mistakes.
Comment on lines 50 to 53
failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs" \
--jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1)
if [ -n "$failed_job" ]; then
echo "| ${sha} | ${title} | ${workflow} / ${failed_job} |" >> $GITHUB_STEP_SUMMARY
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a workflow run has multiple failed jobs, only the first one is shown due to head -1 on line 51. This could be misleading when debugging. Consider either documenting this limitation in a comment or showing a count of total failed jobs for each run.

Suggested change
failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs" \
--jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1)
if [ -n "$failed_job" ]; then
echo "| ${sha} | ${title} | ${workflow} / ${failed_job} |" >> $GITHUB_STEP_SUMMARY
failed_jobs=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs" \
--jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null)
if [ -n "$failed_jobs" ]; then
first_failed_job=$(printf '%s\n' "$failed_jobs" | head -1)
failed_count=$(printf '%s\n' "$failed_jobs" | wc -l | tr -d ' ')
if [ "$failed_count" -gt 1 ]; then
suffix=" (+$((failed_count - 1)) more)"
else
suffix=""
fi
echo "| ${sha} | ${title} | ${workflow} / ${first_failed_job}${suffix} |" >> $GITHUB_STEP_SUMMARY

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 74 to 83
# Fallback: just show recent failed runs globally
gh api repos/pytorch/executorch/actions/runs?per_page=20 \
--jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \
while IFS='|' read -r run_id sha title workflow; do
failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs?per_page=100" \
--jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1)
if [ -n "$failed_job" ]; then
echo "| ${sha} | ${title} | ${workflow} / ${failed_job} |" >> $GITHUB_STEP_SUMMARY
fi
done
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback section queries recent failed runs without filtering by branch or commit range. This means it will show failures from all branches, which may not be relevant to understanding why viable/strict couldn't be updated. Consider adding a branch filter (e.g., filtering for the main branch) or adding a note in the output that these are global failures and may not be related to the viable/strict update issue.

Copilot uses AI. Check for mistakes.
Comment on lines 42 to 43
echo "| Commit | Title | Failed Job |" >> $GITHUB_STEP_SUMMARY
echo "|--------|-------|------------|" >> $GITHUB_STEP_SUMMARY
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table header is always printed (lines 42-43) but if no failed runs are found for any commits, the table will be empty. This creates a table with only headers and no rows, which looks incomplete. Consider adding a fallback message or check to indicate when no failures were found to display.

Copilot uses AI. Check for mistakes.
Comment on lines 59 to 64
failed_run=$(gh api "repos/pytorch/executorch/actions/runs?head_sha=${sha}&per_page=100" \
--jq '.workflow_runs[] | select(.conclusion == "failure" or .conclusion == "cancelled") | "\(.id)|\(.name)"' 2>/dev/null | head -1)

if [ -n "$failed_run" ]; then
run_id=$(echo "$failed_run" | cut -d'|' -f1)
workflow_name=$(echo "$failed_run" | cut -d'|' -f2)
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pipe delimiter used to separate fields can break parsing if workflow names or job names contain pipe characters. While this is unlikely in practice, it could cause field misalignment in the output table. For more robust parsing, consider using a less common delimiter (such as a tab character or ASCII unit separator) or sanitizing the workflow and job names in the jq filter using gsub, similar to how commit titles are sanitized on line 52.

Copilot uses AI. Check for mistakes.
else
# Fallback: just show recent failed runs globally
gh api repos/pytorch/executorch/actions/runs?per_page=20 \
--jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the issue in the main branch above, the pipe delimiter can break parsing if any of the fields (workflow name, display_title, or job name) contain pipe characters. Consider using a more robust delimiter or sanitizing all fields in the jq filter, not just display_title.

Suggested change
--jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name)"' 2>/dev/null | \
--jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)|\(.head_sha | .[0:7])|\(.display_title | gsub("\\|"; "-") | .[0:50])|\(.name | gsub("\\|"; "-"))"' 2>/dev/null | \

Copilot uses AI. Check for mistakes.
Comment on lines 89 to 91
gh api repos/pytorch/executorch/actions/runs?per_page=100 \
--jq '.workflow_runs[] | select(.conclusion == "failure") | .name' 2>/dev/null \
| sort | uniq -c | sort -rn | head -10 >> $GITHUB_STEP_SUMMARY || echo "Failed to fetch data" >> $GITHUB_STEP_SUMMARY
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the fallback section above, this query fetches all failed workflow runs without filtering by branch or time range. This could include failures from feature branches or old failures that are not relevant to the viable/strict update. Consider filtering by branch (e.g., event=push, branch=main) or adding a created filter to only show recent failures that would actually block the viable/strict update.

Copilot uses AI. Check for mistakes.
@mergennachin mergennachin force-pushed the viablestrict-fail-on-no-green branch from aa73858 to 056c049 Compare January 16, 2026 17:09
@mergennachin mergennachin marked this pull request as ready for review January 16, 2026 18:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


# Filter by branch=main and use unit separator
gh api "repos/pytorch/executorch/actions/runs?branch=main&per_page=20" \
--jq '.workflow_runs[] | select(.conclusion == "failure") | "\(.id)\u001f\(.head_sha | .[0:7])\u001f\(.display_title | .[0:50])\u001f\(.name)"' 2>/dev/null | \
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jq string slicing syntax .[0:7] and .[0:50] is incorrect for strings in jq. Use [:7] and [:50] instead. The current syntax will cause jq to fail when processing these fields.

Copilot uses AI. Check for mistakes.
# Get commits between viable/strict and main (up to 20)
# Use ASCII unit separator (0x1f) to avoid issues with special chars in names
COMMITS=$(gh api "repos/pytorch/executorch/compare/${VIABLE_SHA}...${MAIN_SHA}" \
--jq '.commits | reverse | .[0:20] | .[] | "\(.sha)\u001f\(.commit.message | split("\n")[0] | .[0:50])"' 2>/dev/null || echo "")
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jq string slicing syntax .[0:50] is incorrect for strings in jq. Use [:50] instead. The current syntax will cause jq to fail when processing commit messages.

Suggested change
--jq '.commits | reverse | .[0:20] | .[] | "\(.sha)\u001f\(.commit.message | split("\n")[0] | .[0:50])"' 2>/dev/null || echo "")
--jq '.commits | reverse | .[0:20] | .[] | "\(.sha)\u001f\(.commit.message | split("\n")[0] | [:50])"' 2>/dev/null || echo "")

Copilot uses AI. Check for mistakes.
Comment on lines 66 to 67
failed_run=$(gh api "repos/pytorch/executorch/actions/runs?head_sha=${sha}&per_page=100" \
--jq '.workflow_runs[] | select(.conclusion == "failure" or .conclusion == "cancelled") | "\(.id)\u001f\(.name)"' 2>/dev/null | head -1)
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query selects both 'failure' and 'cancelled' runs, but the section header and error message only mention 'failures'. This could be confusing when cancelled runs appear in the output. Consider either filtering only failures or updating the messaging to clarify that cancelled runs are included.

Copilot uses AI. Check for mistakes.
echo "### Summary by job (main branch, last 24h)" >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY
# Filter by main branch and created in last 24 hours
yesterday=$(date -u -d '24 hours ago' '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -v-24H '+%Y-%m-%dT%H:%M:%SZ')
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback date command uses macOS-specific syntax (-v-24H) which won't work on the ubuntu-22.04 runner specified in line 15. Since the job runs on Ubuntu, the GNU date syntax will always succeed and the macOS fallback will never execute. Consider removing the unnecessary fallback.

Suggested change
yesterday=$(date -u -d '24 hours ago' '+%Y-%m-%dT%H:%M:%SZ' 2>/dev/null || date -u -v-24H '+%Y-%m-%dT%H:%M:%SZ')
yesterday=$(date -u -d '24 hours ago' '+%Y-%m-%dT%H:%M:%SZ')

Copilot uses AI. Check for mistakes.
Add CI failure summary and exit with error when no green commit is
found, making the workflow show as RED in the dashboard instead of
always succeeding silently.

The summary includes:
- Per-commit table showing commit SHA, title, and failed job
- Aggregate count of failures by workflow name

This makes it easier to identify which CI jobs are blocking
viable/strict updates.
Handle cancelled workflow runs and jobs in addition to failures.
Also add per_page=100 to jobs API call for workflows with many jobs.
Instead of showing random recent failures globally, now query commits
between viable/strict and main branches and show their specific failures.
This gives a clearer picture of what's blocking viable/strict from advancing.
- Use tab delimiter instead of pipe to avoid parsing issues with names
  containing pipe characters
- Sanitize all fields (workflow names, job names, titles) with gsub
- Filter fallback query by branch=main instead of showing all branches
- Filter summary by job to main branch and last 24 hours
- Add note in fallback explaining these are main branch failures
- Only print table headers when there are failures to show
- Add empty state message when no failures found
Replace tab delimiter with ASCII unit separator (0x1f) which is
guaranteed not to appear in workflow names, job names, or commit
titles. Simplify sanitization to just use tr for pipe characters.
- Use correct jq string slicing syntax (.[:N] instead of .[0:N])
- Remove cancelled filter since we only mention failures in messaging
- Remove unnecessary macOS date fallback (runs on Ubuntu)
@mergennachin
Copy link
Contributor Author

Note to self: land after the branch cut

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

clickhouse-password: ${{ secrets.CLICKHOUSE_VIABLESTRICT_PASSWORD }}

- name: Summarize CI failures
if: steps.update.outputs.latest_viable_sha == 'None'
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string comparison == 'None' suggests the output is the literal string 'None' rather than an empty string or null. This is an unusual pattern for GitHub Actions outputs. Consider verifying this matches the actual output format from the upstream action, or use a more conventional empty check like == '' if appropriate.

Suggested change
if: steps.update.outputs.latest_viable_sha == 'None'
if: steps.update.outputs.latest_viable_sha == ''

Copilot uses AI. Check for mistakes.
echo "| Commit | Title | Failed Job |" >> $GITHUB_STEP_SUMMARY
echo "|--------|-------|------------|" >> $GITHUB_STEP_SUMMARY
while IFS=$'\x1f' read -r sha title; do
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using $'\x1f' (hexadecimal escape) is inconsistent with the \u001f (Unicode escape) used in jq queries. While both represent the ASCII unit separator (0x1F), mixing notation styles reduces readability. Consider standardizing on one approach throughout the script.

Copilot uses AI. Check for mistakes.
[ -z "$sha" ] && continue
short_sha="${sha:0:7}"
# Sanitize title for markdown table
title=$(echo "$title" | tr '|' '-')
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanitization using tr '|' '-' only replaces pipe characters. Other markdown special characters (backticks, brackets, asterisks, underscores) in commit messages could break the table formatting or create unintended markdown rendering. Consider more comprehensive escaping or using a safer delimiter pattern.

Copilot uses AI. Check for mistakes.
if [ -n "$failed_run" ]; then
IFS=$'\x1f' read -r run_id workflow_name <<< "$failed_run"
# Sanitize workflow name for markdown table
workflow_name=$(echo "$workflow_name" | tr '|' '-')
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanitization using tr '|' '-' only replaces pipe characters. Other markdown special characters (backticks, brackets, asterisks, underscores) in workflow names could break the table formatting or create unintended markdown rendering. Consider more comprehensive escaping or using a safer delimiter pattern.

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +100
while IFS=$'\x1f' read -r run_id sha title workflow; do
# Sanitize for markdown table
title=$(echo "$title" | tr '|' '-')
workflow=$(echo "$workflow" | tr '|' '-')
failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs?per_page=100" \
--jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1 | tr '|' '-')
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanitization using tr '|' '-' only replaces pipe characters. Other markdown special characters (backticks, brackets, asterisks, underscores) could break the table formatting or create unintended markdown rendering. Consider more comprehensive escaping or using a safer delimiter pattern.

Suggested change
while IFS=$'\x1f' read -r run_id sha title workflow; do
# Sanitize for markdown table
title=$(echo "$title" | tr '|' '-')
workflow=$(echo "$workflow" | tr '|' '-')
failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs?per_page=100" \
--jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1 | tr '|' '-')
sanitize_md() {
local input="$1"
# Replace table separators and escape other markdown-special characters
input=${input//|/-}
input=${input//\`/\'}
input=${input//\[/\\[}
input=${input//]/\\]}
input=${input//*/\\*}
input=${input//_/\\_}
printf '%s' "$input"
}
while IFS=$'\x1f' read -r run_id sha title workflow; do
# Sanitize for markdown table
title=$(sanitize_md "$title")
workflow=$(sanitize_md "$workflow")
failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs?per_page=100" \
--jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1)
failed_job=$(sanitize_md "$failed_job")

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +76
failed_job=$(gh api "repos/pytorch/executorch/actions/runs/${run_id}/jobs?per_page=100" \
--jq '.jobs[] | select(.conclusion == "failure") | .name' 2>/dev/null | head -1 | tr '|' '-')
Copy link

Copilot AI Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API call is made inside a loop for each failed run, potentially causing many sequential API requests. Consider batching or caching job information, or at minimum adding rate limit awareness to prevent hitting GitHub API limits when many failures exist.

Copilot uses AI. Check for mistakes.
@mergennachin mergennachin merged commit 65bb3b0 into main Jan 20, 2026
171 of 174 checks passed
@mergennachin mergennachin deleted the viablestrict-fail-on-no-green branch January 20, 2026 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants