Skip to content

vectordotdev/github-tools

Repository files navigation

github-tools

Note

Experimental repo for gaining insights into open source project health.

Tools for extracting data from GitHub, storing it in a local SQLite database, querying it, and visualizing trends.

Trends

Per-repo trend pages with all charts:

Directory Layout

src/             # Rust source (single binary: github-tools)
scripts/util/    # Python: plot.py (charts), json_to_csv.py (utility)
data/            # Committed snapshots: JSON inputs and PNG charts
  {owner}_{repo}/issues/  # Issues/PRs JSON split by year (2024.json, 2025.json, ...)
  images/        # Committed PNG charts (written directly by plot.py)
out/             # Gitignored — all generated and local-only files
  historical/    # Raw JSON fetched from GitHub API
  db/            # SQLite databases
  summaries/     # Generated CSVs
  purge/         # Purge audit logs (local only)

Build

cargo build --release
# Binary: target/release/github-tools

Configuration

Commands read credentials from the environment. Keep them in a single secrets.env:

GITHUB_TOKEN=...
DOCKER_USERNAME=...   # purge commands only
DOCKER_PASSWORD=...   # purge commands only

The target repository is always specified explicitly via --repo org/name.

If you use a password manager CLI (e.g. op), store secret references there and inject at runtime — secrets never touch disk:

op run --env-file secrets.env -- github-tools fetch-issues --repo vectordotdev/vector

Plain text env files still work via --env-file for users without a password manager CLI.

Commands

github-tools <COMMAND>

Fetch:
  fetch-all          Fetch issues + discussions for all repos (workflow)
  fetch-issues       Fetch all issues/PRs for a repository
  fetch-discussions  Fetch all discussions for a repository
  fetch-labels       Fetch all labels for a repository

Pipeline:
  generate-all       Build DB + summaries for all repos (workflow)
  build-db           Load issues JSON into SQLite database
  generate-summaries Generate CSV summaries from SQLite database

Purge:
  purge-all          Run all purge operations (workflow)
  purge nightly      Purge old nightly images from GitHub and Docker Hub
  purge untagged     Purge untagged GitHub container images
  purge vector-dev   Purge old vector-dev images from Docker Hub

AI review:
  automated-review-stats  Count review bot comments by reaction (liked / disliked / no signal)

Maintenance:
  close-old-prs          Close PRs with 'meta: awaiting author' older than 6 months
  delete-stale-branches  Delete branches with no commits in 4 years
  remove-legacy-label    Remove legacy type labels from issues/PRs

Run github-tools <COMMAND> --help for full argument details.

Workflow

1. (Optional) Fetch fresh data from GitHub

for repo in vectordotdev/vector vectordotdev/vrl quickwit-oss/quickwit quickwit-oss/tantivy; do
  op run --env-file secrets.env -- github-tools fetch-all --repo "$repo"
done

Writes to out/historical/. The fetched JSON must be split by year and promoted to data/ to commit as a snapshot. Issues/PRs are stored in data/{owner}_{repo}/issues/{year}.json.

2. Generate DB, summaries, and charts

for repo in vectordotdev/vector vectordotdev/vrl quickwit-oss/quickwit quickwit-oss/tantivy; do
  github-tools generate-all --repo "$repo"
  python -m scripts.util.plot \
    --repo "$repo" --input-dir out/summaries \
    --window 2y \
    --exclude-labels "no-changelog,meta: awaiting author"
done

Charts are written directly into data/images/. Review the diff before committing.

3. (Optional) Purge stale container images

op run --env-file secrets.env -- github-tools purge-all --dry-run
op run --env-file secrets.env -- github-tools purge-all  # omit --dry-run to execute

Audit logs written to out/purge/ (local only).

4. AI-assisted review stats

Measures how contributors react to automated review bot comments (👍 liked / 👎 disliked / no signal).

# Discover the bot's GitHub login (lists all review comment authors by frequency)
op run --env-file secrets.env -- github-tools automated-review-stats \
  --repo vectordotdev/vector --since 3m

# Produce stats + update trends/vector.md
op run --env-file secrets.env -- github-tools automated-review-stats \
  --repo vectordotdev/vector \
  --bot-login "chatgpt-codex-connector" \
  --since 2026-01-01

Outputs:

  • Console summary (like rate, dislike rate)
  • out/automated-review-stats/{owner}_{repo}.csv — per-comment table with URL and reaction (gitignored)
  • trends/{repo}.md — two summary tables updated in place via AUTO: markers

About

GitHub management tools and open source project insights (Experimental)

Topics

Resources

License

Stars

Watchers

Forks

Contributors