Skip to content

Conversation

@wende
Copy link
Owner

@wende wende commented Jan 7, 2026

Note

Introduces a comprehensive environment health check via a new codebook utils status command and supporting utilities.

  • Adds utils CLI group with status command (flags: --check-backend, --check-cicada) and shared get_client_from_context helper; updates existing commands to use it
  • New src/codebook/utils.py with CodeBookStatusChecker, StatusReport, and LinkValidationResult to validate markdown file links/anchors, EXEC (Python) syntax, and CICADA blocks; exported in __init__.py
  • Extensive tests: new tests/test_utils.py and CLI helper tests in tests/test_cli.py
  • Docs: new codebook/UTILS.md, README link to Utils, version stamp updates
  • Makefile: install target added for editable dev install

Written by Cursor Bugbot for commit 744bebd. This will update automatically on new commits. Configure here.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The new utils status command uses CodeBookClient but never imports it in cli.py, which will cause a runtime error when --check-backend is used; add the appropriate import near the other client imports.
  • The _is_binary_file helper shells out to git diff for each file, which may be slow and brittle; consider either batching the detection or using a simpler heuristic (e.g., checking for null bytes or using git check-attr --all once) to avoid repeated subprocess calls per file.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `utils status` command uses `CodeBookClient` but never imports it in `cli.py`, which will cause a runtime error when `--check-backend` is used; add the appropriate import near the other client imports.
- The `_is_binary_file` helper shells out to `git diff` for each file, which may be slow and brittle; consider either batching the detection or using a simpler heuristic (e.g., checking for null bytes or using `git check-attr --all` once) to avoid repeated subprocess calls per file.

## Individual Comments

### Comment 1
<location> `src/codebook/cli.py:2797` </location>
<code_context>
+        click.echo("-" * 60)
+
+        # Get client from context or create from config
+        client = ctx.obj.get("client")
+        if not client:
+            client = CodeBookClient(
</code_context>

<issue_to_address>
**issue (bug_risk):** Guard against `ctx.obj` being `None` when accessing the client.

Click doesn’t guarantee `ctx.obj` is set, so if it’s `None`, `ctx.obj.get("client")` will raise `AttributeError`. You could normalize it early (e.g., `if ctx.obj is None: ctx.obj = {}`) or use something like `getattr(ctx, "obj", {})` to avoid crashing when the command runs without a populated context.
</issue_to_address>

### Comment 2
<location> `src/codebook/utils.py:109-118` </location>
<code_context>
+        "search-module": [],  # module_name or file_path (at least one)
</code_context>

<issue_to_address>
**issue (bug_risk):** The `search-module` endpoint never actually enforces that `module_name` or `file_path` is provided.

`CICADA_REQUIRED_PARAMS["search-module"]` is empty, so the special-case check in `validate_cicada_block` (inside `if param not in params`) never runs. That means a `search-module` block with no params is considered valid. Please either (a) add an explicit check for `search-module` that enforces at least one of `module_name` or `file_path`, or (b) represent this constraint explicitly instead of relying on the `required` dict for this endpoint.
</issue_to_address>

### Comment 3
<location> `tests/test_cli.py:1397` </location>
<code_context>
+        # Should analyze text file
+        assert "text_code.py" in result.output
+        # Should NOT analyze binary file
+        assert "image.png" not in result.output or "Could not analyze" not in result.output
+
     def test_task_mark_reviewed_help(self, runner: CliRunner):
</code_context>

<issue_to_address>
**issue (testing):** Assertion for skipping binary files is too weak due to use of `or`

The condition `"image.png" not in result.output or "Could not analyze" not in result.output` passes if either substring is missing, so it doesn’t reliably assert that binary files are neither analyzed nor reported as errors. For example, if the CLI outputs `image.png` without `"Could not analyze"`, this test still passes. Prefer either:

```python
assert "image.png" not in result.output
```

or, if you also want to assert no failure message for the binary file:

```python
assert "image.png" not in result.output
assert "Could not analyze" not in result.output
```

(or a single assertion with `and`).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

# Should analyze text file
assert "text_code.py" in result.output
# Should NOT analyze binary file
assert "image.png" not in result.output or "Could not analyze" not in result.output
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (testing): Assertion for skipping binary files is too weak due to use of or

The condition "image.png" not in result.output or "Could not analyze" not in result.output passes if either substring is missing, so it doesn’t reliably assert that binary files are neither analyzed nor reported as errors. For example, if the CLI outputs image.png without "Could not analyze", this test still passes. Prefer either:

assert "image.png" not in result.output

or, if you also want to assert no failure message for the binary file:

assert "image.png" not in result.output
assert "Could not analyze" not in result.output

(or a single assertion with and).

@wende wende force-pushed the feature/utils-status branch from 852131d to 8afa31a Compare January 7, 2026 11:00
@github-actions
Copy link

github-actions bot commented Jan 7, 2026

📚 CodeBook Coverage

Overall: 84.6% (📈 +0.4% from main)

Changed Files

File Main PR Change
codebook/UTILS.md 0.0% 100.0% 📈 +100.0%
tests/test_utils.py 0.0% 98.7% 📈 +98.7%
src/codebook/utils.py 0.0% 96.0% 📈 +96.0%
Makefile 21.8% 24.8% 📈 +3.0%
tests/test_cli.py 73.8% 72.9% 📉 -0.9%
codebook/README.md 52.3% 53.0% 📈 +0.7%
src/codebook/init.py 95.2% 95.7% 📈 +0.5%
src/codebook/cli.py 82.8% 83.2% 📈 +0.4%

- Guard against ctx.obj being None in utils_status command
- Fix search-module validation to require module_name OR file_path
- Strengthen test assertion for binary file handling
- Add test coverage for search-module validation
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on February 1

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

from .cicada import CicadaClient

# Get Cicada URL from context or config
cicada_url = ctx.obj.get("cicada_url")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent context access may crash on None

Medium Severity

The access pattern for ctx.obj is inconsistent within the same function. At line 2797, getattr(ctx, "obj", {}).get("client") defensively handles the case where ctx.obj might be None. However, at line 2831, ctx.obj.get("cicada_url") directly accesses ctx.obj without this protection. If ctx.obj is None (which can happen in certain testing or invocation scenarios), this will raise an AttributeError before the fallback logic can execute.

Fix in Cursor Fix in Web

Extract get_client_from_context helper:
- Centralizes client initialization logic
- Documents when defensive ctx.obj access is needed
- Makes code more DRY and easier to maintain

Improve search-module validation:
- Use early return pattern for special case
- Prevents future bugs if someone adds to CICADA_REQUIRED_PARAMS
- Makes validation flow clearer with explicit documentation
- Special case now handled before generic validation

try:
start_time = time.time()
client.health_check()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend health check return value ignored, false positives

High Severity

The client.health_check() return value is not checked. This method returns False when the backend is unhealthy (non-200 status) without raising an exception - it catches RequestException internally and returns False. Since the return value is ignored, the code always sets report.backend_healthy = True and displays "✅ Backend healthy" as long as no network exception occurs. Other usages in the codebase (lines 326, 460, 583) correctly check the return value with if client.health_check() or if not client.health_check().

Fix in Cursor Fix in Web

file_part, section = target, None

# Resolve relative path from source file
target_path = (source_file.parent / file_part).resolve()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anchor-only links resolve to parent directory instead of current file

Low Severity

When validate_file_link receives an anchor-only link like #section, the split("#", 1) produces an empty file_part. The expression source_file.parent / "" resolves to the parent directory rather than the current file. This causes section anchor validation to attempt reading a directory as text, which fails silently (returns False from _validate_section_anchor due to the caught OSError), incorrectly reporting the section as not found. For anchor-only links, target_path should be source_file itself when file_part is empty.

Fix in Cursor Fix in Web

Redesigned get_client_from_context:
- Removed cfg parameter - not needed, all commands run under main() group
- Changed from defensive fallback to fail-fast with clear error messages
- Documented special case (run command bypasses this helper)

Applied helper to ALL 7 command functions:
- render, watch, diff, show, health, clear_cache, utils_status
- Eliminated 6 instances of direct ctx.obj["client"] access
- Now truly centralizes client retrieval logic (100% adoption vs 12.5%)

Added comprehensive test coverage:
- test_get_client_from_context_success: validates successful client retrieval
- test_get_client_from_context_no_obj: verifies error when ctx.obj is None
- test_get_client_from_context_no_client_key: verifies error when client key missing

Benefits:
- Consistent pattern across entire CLI codebase
- Better error messages for misconfigured contexts
- Easier to maintain and extend
- Self-documenting code with clear helper purpose
# Try a simple query to check if Cicada is responsive
cicada.query(keywords=["test"])

report.cicada_healthy = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cicada health check ignores query result success status

Medium Severity

The cicada.query() method returns a CicadaResult object with a success field indicating whether the operation succeeded, but the return value is discarded. Since CicadaClient._post() catches RequestException and returns a CicadaResult with success=False instead of raising, the code unconditionally sets report.cicada_healthy = True even when the Cicada server is down or returns an error.

Fix in Cursor Fix in Web

click.echo(f"\n❌ Broken file links: {len(broken_file_links)}")
for result in broken_file_links:
click.echo(
f" {result.file_path.relative_to(base_dir)}:{result.line_number} - {result.error_message}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unhandled ValueError when main_dir is absolute path

Low Severity

When cfg.main_dir is configured as an absolute path outside the current directory, result.file_path.relative_to(base_dir) raises an unhandled ValueError. Python's Path("/cwd") / "/absolute/path" yields /absolute/path, so files scanned won't be relative to base_dir. The existing codebase wraps relative_to calls in try/except blocks (e.g., lines 696-699), but this new code doesn't follow that pattern, causing a crash when displaying validation results.

Additional Locations (2)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant