feat(migration): Add options inventory script by kenzoengineer · Pull Request #123 · getsentry/sentry-options

kenzoengineer · 2026-05-01T22:09:58Z

Searches the getsentry and sentry library for option usages.

cursor · 2026-05-14T21:42:30Z

+
+if args.db_name:
+    from django.conf import settings as _s
+    _s.DATABASES['default']['NAME'] = args.db_name


Module-level side effects prevent safe import

Low Severity

Argument parsing (parse_args()), os.chdir(), sys.path manipulation, and Django configure() all execute at module level rather than inside a function guarded by if __name__ == '__main__':. Importing this module triggers directory changes, environment mutations, and full Django initialization as side effects. The sibling script option_inventory.py correctly places all setup inside main().

Additional Locations (1)

scripts/migration/enumerate_active_options.py#L146-L148

^{Reviewed by Cursor Bugbot for commit 18da02c. Configure here.}

sentry · 2026-05-14T21:42:51Z

+    n_runtime = len(runtime_keys) + len(active_unknown_db)
+    total     = len(registrations) + n_runtime


Bug: The 'runtime-only' filter in the HTML report shows a combined count but only filters for one category of options, leading to a mismatch between the displayed count and the results.
_{Severity: LOW}

Suggested Fix

Modify the filter logic to correctly display all items that contribute to the 'runtime-only' count. One approach is to have the 'runtime-only' button set a status that the JavaScript filter can use to show both RUNTIME_ONLY and DB_ONLY items. For example, the filter could check for both statuses when the 'runtime-only' filter is active.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: scripts/migration/option_inventory.py#L959-L960 Potential issue: In the generated HTML report, the 'runtime-only' statistic combines the count of options found only at runtime (`runtime_keys`) and options found only in the database (`active_unknown_db`). The variable `n_runtime` stores this sum. However, clicking the corresponding UI pill or filter button executes a JavaScript function `setStatus('RUNTIME_ONLY')`. The filtering logic in `updateVisible` performs an exact match for `data-status="RUNTIME_ONLY"`, which excludes items with `data-status="DB_ONLY"`. This results in the number of displayed items being less than the count shown on the filter button, creating a confusing user experience.

Also affects:

scripts/migration/option_inventory.py:1044~1044

scripts/migration/option_inventory.py:1063~1063

sentry · 2026-05-21T20:16:41Z

+        first_line = _git_log_first(repo_root, search, None)
+        if first_line:
+            break
+


Bug: The script can fetch the wrong commit when an option key is mentioned in an older commit, leading to inaccurate historical data in reports.
_{Severity: MEDIUM}

Suggested Fix

To prevent matching unrelated historical mentions, the whole-repository fallback search should be made more specific. Instead of just searching for "key", the search should be scoped to find commits that actually modify a registration file or contain the register("key") pattern. This could involve post-processing the git log results to verify that the found commit actually contains a valid registration change.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: scripts/migration/option_inventory.py#L1209-L1212 Potential issue: The `fetch_first_commit` function uses a broad, whole-repository fallback search for an option's key (e.g., `"key"`). This search uses `git log --reverse -S`, which finds the oldest commit in the repository's history where the key string appears. If the key was mentioned in an unrelated file (like documentation or a comment) before it was formally registered, the script will identify this older, incorrect commit as the origin. This results in displaying inaccurate metadata (author, date, SHA, commit message) in the generated HTML and plan reports, potentially misleading engineers about the option's history.

sentry · 2026-05-21T20:23:13Z

+        local_vals: dict[str, ast.expr] = {}
+        for stmt in ast.walk(func_node):
+            if isinstance(stmt, ast.Assign) and len(stmt.targets) == 1:
+                t = stmt.targets[0]
+                if isinstance(t, ast.Name):
+                    local_vals[t.id] = stmt.value


Bug: The use of ast.walk() to find variable assignments doesn't respect scope, allowing nested function assignments to shadow outer ones, leading to incorrect dynamic option analysis.
_{Severity: LOW}

Suggested Fix

Modify the AST traversal to be scope-aware. Instead of ast.walk(), iterate only through the direct children of the function's body to avoid descending into nested function definitions.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: scripts/migration/option_inventory.py#L306-L311 Potential issue: The script uses `ast.walk(func_node)` to populate a `local_vals` dictionary with variable assignments. This traversal includes nested functions, causing assignments in inner scopes to overwrite those from outer scopes. If a variable used in `options.get()` is shadowed in a nested function, the script may fail to extract the correct f-string prefix. This could lead to an option being incorrectly classified as `SAFE_CANDIDATE` when it should be `NEVER_DELETE`, potentially marking an active option as safe for deletion.

sentry · 2026-05-21T20:23:13Z

+            f'jsonPayload.event="option.seen" AND timestamp>="{start_ts}" AND timestamp<"{end_ts}"',
+            f"--project={PROJECT}",
+            "--format=json",
+            "--limit=10000",


Bug: The script uses a hardcoded --limit=10000 for GCP log queries, which likely exceeds the API's page size limit, and lacks pagination, risking silent data truncation.
_{Severity: HIGH}

Suggested Fix

Implement pagination for the gcloud logging read command. Reduce the --limit to the documented maximum (e.g., 1000) and use the next_page_token from the API response to fetch all results for each time window. Add a warning if the number of results in a page matches the limit, indicating potential truncation.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: scripts/migration/collect-seen-options.py#L62 Potential issue: The script `collect-seen-options.py` queries Google Cloud logs with a hardcoded `--limit=10000`. The Google Cloud Logging API's documented maximum page size is 1000. The script does not implement pagination or check if the number of returned entries equals the limit, which would indicate truncation. If a time window contains more log entries than the effective limit, any options seen in the truncated logs will be missed. This could cause an active option to be incorrectly classified as safe to delete, leading to its removal and potential production failures.

cursor · 2026-05-27T04:39:56Z

+window 719/720  2026-05-26T23:27:28Z → 2026-05-26T23:37:28Z  (+0 new, 783 total)
+window 720/720  2026-05-26T23:37:28Z → 2026-05-26T23:47:28Z  (+0 new, 783 total)
+
+=== 783 unique options written to /Users/josh/dev/sentry-options/scripts/migration/seen.txt ===


Committed gcloud run log artifact

Low Severity

The PR adds a 722-line collect-seen-options-out.txt that is stderr captured from a local make collect-seen-options run (720 window progress lines plus a machine-specific path). Workflow docs only treat seen.txt as the committed output; this log is ephemeral tee output and should not live in the repo.

^{Reviewed by Cursor Bugbot for commit 013021a. Configure here.}

sentry · 2026-05-27T21:26:07Z

+        'author': author, 'date': date, 'message': message,
+        'diff_snippet': _extract_relevant_hunk(diff.stdout, key),
+        'github_url': f'{base}/commit/{sha}' if base else None,
+    }
+
+
+def fetch_investigate_commits(
+    registrations: list[dict],


Bug: The subprocess.run call in fetch_first_commit lacks exception handling, which can crash the script if the git show command times out or fails.
_{Severity: MEDIUM}

Suggested Fix

Wrap the subprocess.run call in fetch_first_commit within a try/except block to catch potential exceptions like subprocess.TimeoutExpired and FileNotFoundError. Return None in the except block, similar to the pattern used for _git_log_first.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: scripts/migration/option_inventory.py#L1228-L1235 Potential issue: The `subprocess.run` call on line 1221 within the `fetch_first_commit` function is not wrapped in a `try/except` block. This can lead to unhandled `subprocess.TimeoutExpired` or `FileNotFoundError` exceptions. Since this function is executed in a `ThreadPoolExecutor`, any such exception will be propagated by `pool.map` and crash the entire script. This behavior is inconsistent with other subprocess calls in the file, which are properly guarded. The script will crash when run with `--html` or `--plan` if a `git show` command takes longer than 30 seconds or if `git` is not available.

Also affects:

scripts/migration/option_inventory.py:1251~1252

sentry · 2026-05-28T22:09:16Z

+  <span class="option-key">{esc(key)}</span>
+  {db_html}{hints_html}<span class="found-tag {found_cls}">{esc(found)}</span>
+</summary><div class="option-body">
+  <div class="option-location">{loc}</div>
+  <pre class="decl">{esc(reg["declaration"])}</pre>
+  {usages_html}
+  {commit_html}
+</div></details></div>''')


Bug: The 'runtime-only' stat pill's count includes active_unknown_db items, but the corresponding filter incorrectly hides them, creating a mismatch between the displayed count and the filtered results.
_{Severity: LOW}

Suggested Fix

To fix the discrepancy, update the JavaScript filter logic. When the currentStatus is 'RUNTIME_ONLY', the filter should show elements where el.dataset.status is either 'RUNTIME_ONLY' or 'DB_ONLY'. This will ensure the filtered view matches the count displayed in the stat pill.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: scripts/migration/option_inventory.py#L959-L966 Potential issue: In the generated HTML report, the 'runtime-only' stat pill displays a count, `n_runtime`, which is calculated as the sum of `runtime_keys` and `active_unknown_db`. However, clicking this pill triggers a JavaScript filter `setStatus('RUNTIME_ONLY')`. This filter only shows elements with `data-status="RUNTIME_ONLY"`. The cards generated from `active_unknown_db` are assigned `data-status="DB_ONLY"` and are therefore hidden. This creates a misleading user interface where the number in the stat pill does not match the number of items shown after filtering, potentially confusing developers who rely on this tool for option cleanup.

this removes 93 unused options identified via tracking read options via logging in #115610 (which this PR also reverts as I'm done collecting data) cross referenced with static analysis in getsentry/sentry-options#123 the details are in getsentry/sentry-options#123 - `scripts/migration/safe.txt` is essentially the set intersection of statically unreferenced options and unseen/unread options over 120 hours of GCP production logs. the unseen list was determined by subtracting options.seen unique events from all known options (options.all()) and I observed no newly seen options beyond the ~100 hour mark (`scripts/migration/collect-seen-options-out.txt`). getsentry options that were unused are removed here too: getsentry/getsentry#20449

cursor · 2026-06-02T21:08:49Z

+            if key in seen:
+                skipped_seen.append(key)
+            else:
+                safe.append(key)


Deletion list ignores DB values

High Severity

safe-to-delete-options.py only excludes keys in seen.txt, not options with has_db_value in inventory.csv. Docs say DB-backed options are NEVER_DELETE, but classification never uses active_db_keys, so delete-options.py can still target keys with live DB values.

Additional Locations (1)

scripts/migration/option_inventory.py#L1523-L1546

^{Reviewed by Cursor Bugbot for commit 8ea61a4. Configure here.}

cursor · 2026-06-02T21:08:49Z

+
+                # Bounded by an enumerated static dict (e.g. killswitch registry).
+                if attr.bounded_keys:
+                    continue  # assume this name draws from the registry


Killswitch registry skips unresolved

Medium Severity

For options.get(variable), if the file defines any ALL_*_OPTIONS registry, every unresolved variable is treated as registry-backed and the file may be omitted from truly_uncertain, pushing unrelated options toward SAFE_CANDIDATE.

^{Reviewed by Cursor Bugbot for commit 8ea61a4. Configure here.}

cursor · 2026-06-02T21:08:50Z

+
+PROJECT = "internal-sentry"
+INTERVAL = 600  # seconds per window
+LOOKBACK = 432000  # 120 hours


Lookback window mismatch

Medium Severity

collect-seen-options.py uses a 120-hour LOOKBACK, while UNUSED-OPTIONS.md describes a 24-hour GCP log walk. Operators following the doc get different coverage than the script implements.

Additional Locations (1)

scripts/migration/UNUSED-OPTIONS.md#L26-L27

^{Reviewed by Cursor Bugbot for commit 8ea61a4. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 6 total unresolved issues (including 5 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 5848d8c. Configure here.}

cursor · 2026-06-02T21:44:57Z

+    for entry in entries:
+        key = entry.get("jsonPayload", {}).get("option_key")
+        if key and key not in seen:
+            seen[key] = window_label


Log query cap drops option keys

Medium Severity

Each GCP window uses --limit=10000 with no pagination. Busy 10-minute intervals can return more than 10,000 option.seen entries, so some keys never enter seen.txt and may be wrongly treated as unused in the deletion pipeline.

^{Reviewed by Cursor Bugbot for commit 5848d8c. Configure here.}

sentry · 2026-06-02T21:47:41Z

+        return None
+
+    diff = subprocess.run(
+        ['git', '-C', repo_root, 'show', '--unified=4', '--no-color', sha, '--', source_short],
+        capture_output=True, text=True, timeout=30, env=_GIT_ENV,
+    )
+    base = github_urls.get(repo_root)


Bug: The subprocess.run call for git show in fetch_first_commit is not wrapped in a try/except block, which can crash the script on timeout or other errors.
_{Severity: MEDIUM}

Suggested Fix

Wrap the subprocess.run call in fetch_first_commit within a try...except Exception block. On exception, log the error and return None to prevent the entire script from crashing, similar to the error handling in the _git_log_first helper function.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: scripts/migration/option_inventory.py#L1279-L1285 Potential issue: The `subprocess.run` call executing `git show` within the `fetch_first_commit` function lacks exception handling. This function is executed in a worker thread via `ThreadPoolExecutor.map`. A failure in the subprocess, such as a `subprocess.TimeoutExpired` on a large repository, will raise an exception in the worker thread. This exception is then re-raised in the main thread, causing the entire script to crash and lose all previously completed work. This is likely to occur when using the `--plan` or `--html` flags.

sentry · 2026-06-02T21:47:41Z

+        result = subprocess.run(
+            ['uv', 'run', str(here / 'safe-to-delete-options.py')],
+            capture_output=True, text=True, cwd=here,
+        )
+        keys = [line.split('\t')[0] for line in result.stdout.splitlines() if line]


Bug: The script does not check the return code of the safe-to-delete-options.py subprocess, leading to silent failures being misinterpreted as a successful run with no results.
_{Severity: MEDIUM}

Suggested Fix

After the subprocess.run call, add a check for the result.returncode. If the return code is non-zero, raise an exception or print the contents of result.stderr and exit with an error code to make the failure visible.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: scripts/migration/delete-options.py#L120-L124 Potential issue: When `delete-options.py` is run interactively (without piped stdin), it executes `safe-to-delete-options.py` as a subprocess to get a list of keys. However, the script does not check the `returncode` of the subprocess. If the subprocess fails (e.g., because the required `inventory.csv` file is missing), it will exit with a non-zero code, but the parent script will not detect the failure. It will interpret the empty `stdout` as "no keys to delete" and exit successfully, masking the underlying error.

include options inventory script

56ef20b

kenzoengineer marked this pull request as ready for review May 1, 2026 22:10

sentry Bot reviewed May 1, 2026

View reviewed changes

Comment thread scripts/migration/option_inventory.py Outdated

Comment thread scripts/migration/option_inventory.py Outdated

cursor Bot reviewed May 1, 2026

View reviewed changes

Comment thread scripts/migration/inventory.csv Outdated

Comment thread scripts/migration/option_inventory.py Outdated

joshuarli added 2 commits May 12, 2026 14:50

enhance!

3fd2ccd

iterate

b3ee733

sentry Bot reviewed May 13, 2026

View reviewed changes

Comment thread scripts/migration/enumerate_active_options.py

up

18da02c

cursor Bot reviewed May 14, 2026

View reviewed changes

sentry Bot reviewed May 14, 2026

View reviewed changes

joshuarli mentioned this pull request May 14, 2026

feat: track read options via seen logline getsentry/sentry#115610

Merged

[skip ci] update

6708dde

sentry Bot reviewed May 21, 2026

View reviewed changes

collect seen options

fc43094

sentry Bot reviewed May 21, 2026

View reviewed changes

run

013021a

cursor Bot reviewed May 27, 2026

View reviewed changes

update

77f5e0c

sentry Bot reviewed May 27, 2026

View reviewed changes

93 safe candidates

d3e23bc

joshuarli mentioned this pull request May 28, 2026

ref: delete unused options getsentry/sentry#116409

Merged

[skip ci] womp womp

2c9dc81

sentry Bot reviewed May 28, 2026

View reviewed changes

kenzoengineer added 2 commits June 1, 2026 16:55

fixed bug and regenerated

30219f7

fix regex again and include final readmes

8ea61a4

cursor Bot reviewed Jun 2, 2026

View reviewed changes

update some naming

5848d8c

cursor Bot reviewed Jun 2, 2026

View reviewed changes

sentry Bot reviewed Jun 2, 2026

View reviewed changes

		n_runtime = len(runtime_keys) + len(active_unknown_db)
		total = len(registrations) + n_runtime

Uh oh!

Conversation

kenzoengineer commented May 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot May 14, 2026

Choose a reason for hiding this comment

Module-level side effects prevent safe import

Uh oh!

sentry Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

sentry Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

sentry Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

sentry Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 27, 2026

Choose a reason for hiding this comment

Committed gcloud run log artifact

Uh oh!

sentry Bot May 27, 2026

Choose a reason for hiding this comment

Uh oh!

sentry Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 2, 2026

Choose a reason for hiding this comment

Deletion list ignores DB values

Uh oh!

cursor Bot Jun 2, 2026

Choose a reason for hiding this comment

Killswitch registry skips unresolved

Uh oh!

cursor Bot Jun 2, 2026

Choose a reason for hiding this comment

Lookback window mismatch

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 2, 2026

Choose a reason for hiding this comment

Log query cap drops option keys

Uh oh!

sentry Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

sentry Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants