This tool has two main user interfaces:
- A python library: Given a GitHub org, repository, an initial git reference or date, use the GitHub GraphQL API to return a DataFrame of all issue and PR activity for this time period.
- A Command Line Interface to render this activity as markdown, suitable for generating changelogs or community updates.
These sections describe how to control the major functionality of this tool.
Before generating a changelog you should [generate and add a GitHub Access Token](use:token).
The easiest way to use github-activity to generate activity markdown is to use
the command-line interface. It takes the following form:
github-activity [<org>/<repo>] --since <date or ref> --until <date or ref>The [<org>/<repo>] argument is optional.
If you do not give it, then github-activity will attempt to infer this value by running git remote -v and using either upstream or origin (preferring upstream if both are available).
The (optional) arguments in --since (or -s) and --until (or -u) can either be
a date, or a ref (such as a commit hash or tag). github-activity will pull the activity between the dates corresponding to these values.
There are many other options with the `github-activity` CLI, run `github-activity -h`
for more information
Here's an example on the jupyter notebook repository, grabbing all activity since the latest major release and outputting it to a markdown file.
github-activity jupyter/notebook -s 6.0.0 -u 6.0.1 -o sample_notebook_activity.md
You can find the resulting markdown here.
For repositories that use multiple branches, it may be necessary to filter PRs by a branch name. This can be done using the `--branch` parameter in the CLI. Other git references can be used as well in place of a branch name.
By default, github-activity will pull the activity after the latest GitHub release or git tag. You can choose to manually control the date ranges as well.
To specify a start date, use the -s (or --since) parameter. To specify an end date, use the -u or --until parameter.
Each of these accepts either:
- A date string. This can be anything that
dateutil.parser.parseaccepts. - A git
ref. For example, acommit hashor atag.
If no -u parameter is given, then all activity until today will be included.
(prefixes-and-tags)=
Often you wish to split your PRs into multiple categories so that they are easier to scan and parse. You may also only want to keep some PRs (e.g. features, or API changes) while excluding others from your changelog.
github-activity uses the GitHub tags as well as PR prefixes to automatically
categorize each PR and display it in a section in your markdown. It roughly
follows the keepachangelog taxonomy of changes.
Below is a list of the supported PR types, as well as the tags / title prefixes that will be used to identify the right category.
You can choose to *remove* some types of PRs from your changelog by passing the
`--tags` parameter in the CLI. This is a list of a subset of names taken from the
left-most column above.
By default, GitHub Activity will include anybody that reviews or comments in a pull request in the item for that PR. This is included in a list of authors at the end of each item. See the JupyterHub Changelog for examples.
By default, this tool will include a long list of contributors at the end of your changelog. This is the unique set of all contributors that contributed to the release.
(how-does-this-tool-define-contributions-in-the-reports)=
GitHub Activity tries to automatically determine the unique list of contributors within a given window of time. There are many ways to define this, and there isn't necessarily a "correct" method out there.
We try to balance the two extremes of "anybody who shows up is recognized as contributing" and "nobody is recognized as contributing". We've chosen a few rules that try to reflect sustained engagement in issues/PRs, or contributions in the form of help in others' issues or contributing code.
Here are the rules we follow for finding a list of contributors within a time window. A contributor is anyone who has:
- Contributed to a PR merged in that window (includes opening, merging, committing, or commenting)
- Commented on >= 2 issues that weren't theirs
- Commented >= 6 times on any one issue
- Known bot accounts are generally not considered contributors
We'd love feedback on whether this is a good set of rules to use.
If you follow the title prefix convention used to split PRs, you can remove these prefixes when you generate your changelog, so that they don't clutter the output.
To strip title prefix metadata, use the --strip-brackets flag.
For example, [DOC] Add some documentation will be rendered as Add some documentation.
To change the starting heading level for changelog items, use the --heading-level N flag. Where N is the starting heading level (e.g., 2 corresponds to ##).
This is useful if you want to embed your changelog into a larger one (e.g., CHANGELOG.md).
To include closed issues in your changelog, use the --include-issues flag.
To include Issues and Pull Requests that were opened in a time period, use the --include-opened flag.
(use:token)=
github-activity uses the GitHub API to pull information about a repository's activity.
You will quickly hit your API limit so you must use a personal access or API token.
There are two ways that you can generate your own access token for use with github-activity, each is described below:
You can use the GitHub command line interface to authenticate your account and store an access token in your local environment. To do so, download the GitHub CLI, and run the following command:
# Authenticate with GitHub via the web interface
gh auth login --webThis will open a web interface for you to authenticate. When finished, it will store an access token locally, which you can print with:
# Print an access token if it is stored
gh auth status -tThis token will automatically be used by github-activity if it exists.
Alternatively, you can create your own GitHub access token and store it yourself. To do so, follow these steps:
- Create your own access token. Go to the new GitHub access token page and follow the instructions. Note that while working with a public repository, you don't need to set any scopes on the token you create.
- Assign the token to an environment variable called
GITHUB_ACCESS_TOKEN. If you rungithub-activityand this variable is defined, it will be used. You may also pass a token via the--authparameter (though this is not the best security practice).
You can do most of the above from Python as well. This is not as well-documented as the CLI, but should have most functionality available.
For generating markdown changelogs from Python, here's an example:
from github_activity import generate_activity_md
markdown = generate_activity_md(
target="executablebooks/github-activity",
since="2023-01-01",
until="2023-12-31",
kind=None,
auth="your-github-token",
tags=None,
include_issues=True,
include_opened=True,
strip_brackets=True,
heading_level=1,
branch=None,
)
# Print or save the markdown
print(markdown)
For scraping GitHub and returning the data as a DataFrame, here's an example:
from github_activity import get_activity
# Get activity data as a DataFrame
from github_activity import get_activity
df = get_activity(
target="executablebooks/github-activity",
since="2023-01-01",
until="2023-12-31",
auth="your-github-token",
kind=None,
cache=None
)In some cases, metadata will be nested inside the resulting dataframe. There are some helper functions for this. For example, to extract nested comments inside the activity dataframe:
from github_activity import get_activity, extract_comments
df = get_activity(...)
comments_df = extract_comments(df['comments'])