Skip to content

fix: refresh OSS health snapshots monthly#531

Open
tym83 wants to merge 2 commits intomainfrom
fix/oss-health-monthly-refresh
Open

fix: refresh OSS health snapshots monthly#531
tym83 wants to merge 2 commits intomainfrom
fix/oss-health-monthly-refresh

Conversation

@tym83
Copy link
Copy Markdown
Contributor

@tym83 tym83 commented May 7, 2026

Summary

  • refresh OSS Health data snapshots for DevStats, OSS Insight, OpenSSF, and Telemetry
  • fix the monthly OSS Health workflow by running Python scripts via python3 instead of relying on executable bits
  • include telemetry in the monthly OSS Health refresh so all four pages update from one scheduled PR
  • make the telemetry-only workflow use a PR branch instead of pushing directly to protected main
  • fix OpenSSF last-updated parsing by using the English status page and stripping HTML tags before matching

Root Cause

  • The May 1 scheduled update-oss-health run failed with Permission denied when make tried to execute ./hack/update_oss_health.py.
  • The telemetry workflow attempted to push directly to main and was rejected by repository rules requiring PRs and DCO.

Verification

  • GITHUB_TOKEN= make update-oss-health
  • python3 -m py_compile hack/update_oss_health.py hack/fetch_telemetry.py
  • HUGO_ENV=production hugo --gc --minify --destination /tmp/cozystack-site-public-oss-health --cacheDir /tmp/hugo-cache

Summary by CodeRabbit

  • Chores

    • Refreshed OSS Health and telemetry snapshots (metrics, timestamps, app lists).
    • Updated automation targets to run the OSS health and telemetry refresh steps.
  • Bug Fixes

    • Improved OpenSSF scraping to more reliably detect the "last updated" timestamp.
  • Documentation

    • Clarified script usage notes and added inline workflow documentation for monthly refresh/backfill.

Signed-off-by: tym83 <6355522@gmail.com>
@tym83 tym83 requested review from kvaps and lllamnyp as code owners May 7, 2026 05:48
@netlify
Copy link
Copy Markdown

netlify Bot commented May 7, 2026

Deploy Preview for cozystack ready!

Name Link
🔨 Latest commit 55c4f41
🔍 Latest deploy log https://app.netlify.com/projects/cozystack/deploys/69fc50dcea3cf70008407f67
😎 Deploy Preview https://deploy-preview-531--cozystack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 7, 2026

Review Change Stack
No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d1db1967-2687-4d0b-8083-79895392b24b

📥 Commits

Reviewing files that changed from the base of the PR and between e36d8b1 and 55c4f41.

📒 Files selected for processing (1)
  • hack/update_oss_health.py

📝 Walkthrough

Walkthrough

This PR refactors the OSS health and telemetry data refresh infrastructure. GitHub Actions workflows now use dedicated branches (update-telemetry, update-oss-health) with conditional PR creation. The Makefile invokes both telemetry and health update scripts. Python scripts fix an OpenSSF URL path and improve date extraction. Data snapshots are refreshed across all periods.

Changes

OSS Health & Telemetry Refresh Pipeline

Layer / File(s) Summary
Fetch Telemetry Workflow & PR Management
.github/workflows/fetch-telemetry.yml
Workflow is now manual-only via workflow_dispatch. Branch automation creates/recreates update-telemetry, commits with UTC timestamp, and conditionally creates PR to main if one doesn't exist.
OSS Health Workflow & Change Detection
.github/workflows/update-oss-health.yaml
Workflow adds changed output: exits early if no staged changes, otherwise creates update-oss-health branch and gates PR creation on changed==true.
Build Configuration Updates
Makefile
update-services passes --pkgdir extra flag; update-oss-health target now runs both hack/update_oss_health.py and hack/fetch_telemetry.py.
Data Processing Scripts
hack/fetch_telemetry.py, hack/update_oss_health.py
Docstring updated for monthly workflow usage. OpenSSF scraper switches to non-localized URL path and strips HTML tags before parsing "last updated on" timestamp.
DevStats Snapshots
data/oss-health/devstats.json, static/oss-health-data/devstats.json
Month/quarter/year periods refreshed with updated issue counts, date ranges, language metrics, and top contributors/PR authors.
OpenSSF Snapshots
data/oss-health/openssf.json, static/oss-health-data/openssf.json
Badge and check timestamps updated; badge_last_updated_at populated with concrete value.
OSS Insight Snapshots
data/oss-health/ossinsight.json, static/oss-health-data/ossinsight.json
Month/quarter/year periods refreshed with updated issue counts, date ranges, commits/PR metrics, and top contributors/authors.
Summary & Telemetry Snapshots
data/oss-health/summary.json, static/oss-health-data/summary.json, static/oss-health-data/telemetry.json
Summary timestamps updated. Telemetry reorganized from April-only to May-focused periods with updated app lists and date ranges.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Telemetry flows and health stats bloom,
Workflows dance through GitHub's room,
Branches sprouting, PRs bloom bright,
May's new data shines so light!
Hopping forward, refresh complete! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: refresh OSS health snapshots monthly' directly and clearly summarizes the main change: establishing a monthly refresh cadence for OSS health data snapshots. It is concise, specific, and accurately reflects the primary objective of the pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/oss-health-monthly-refresh

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the project's OSS health and telemetry data, refreshing metrics for commits, contributors, and issues across multiple JSON data files. Key changes include updating the Makefile to automate telemetry fetching, switching the OpenSSF status URL to English, and improving the parsing of the OpenSSF last updated date by stripping HTML tags. Feedback was provided to enhance the robustness of the HTML parsing logic in hack/update_oss_health.py by unescaping entities and normalizing whitespace to ensure consistent regex matching.

Comment thread hack/update_oss_health.py Outdated
Comment on lines +360 to +361
plain_text = re.sub(r"<[^>]+>", " ", page_text)
match = re.search(r"last updated on\s+(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} UTC)", plain_text, re.IGNORECASE)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The HTML stripping and regex matching for the OpenSSF last updated date could be more robust. Normalizing whitespace after stripping tags and unescaping HTML entities (like &nbsp;) ensures the regex matches correctly even if the source formatting varies or contains non-breaking spaces.

Suggested change
plain_text = re.sub(r"<[^>]+>", " ", page_text)
match = re.search(r"last updated on\s+(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} UTC)", plain_text, re.IGNORECASE)
plain_text = " ".join(unescape(re.sub(r"<[^>]*>", " ", page_text)).split())
match = re.search(r"last updated on\s+(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} UTC)", plain_text, re.IGNORECASE)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@hack/update_oss_health.py`:
- Around line 360-361: The regex for matching "last updated on ..." can fail
when HTML tag removal leaves extra/newline whitespace inside the timestamp;
after stripping tags into plain_text (variable plain_text produced by re.sub),
normalize whitespace (e.g., collapse all runs of whitespace to a single space
using re.sub(r"\s+", " ", plain_text)) before calling re.search so the timestamp
pattern in match reliably finds "YYYY-MM-DD HH:MM:SS UTC"; update the code
around plain_text and match to normalize whitespace prior to the re.search call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bc0b7d45-a905-4a29-9f55-5962e1bde124

📥 Commits

Reviewing files that changed from the base of the PR and between 3e58234 and e36d8b1.

📒 Files selected for processing (14)
  • .github/workflows/fetch-telemetry.yml
  • .github/workflows/update-oss-health.yaml
  • Makefile
  • data/oss-health/devstats.json
  • data/oss-health/openssf.json
  • data/oss-health/ossinsight.json
  • data/oss-health/summary.json
  • hack/fetch_telemetry.py
  • hack/update_oss_health.py
  • static/oss-health-data/devstats.json
  • static/oss-health-data/openssf.json
  • static/oss-health-data/ossinsight.json
  • static/oss-health-data/summary.json
  • static/oss-health-data/telemetry.json

Comment thread hack/update_oss_health.py Outdated
Signed-off-by: tym83 <6355522@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant