Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,10 @@ The key is stored once and shared across all agents on the same machine.
Start your agent and ask naturally:

- *"How is authentication implemented?"*
- *"Find the exact regex or string match for this token parser"*
- *"Show me error handling patterns across services"*
- *"Find similar features to guide my implementation"*
- *"Show me who calls this handler and what it depends on"*

No special commands needed — the agent picks up the skill automatically.

Expand Down
47 changes: 34 additions & 13 deletions skills/codealive-context-engine/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: codealive-context-engine
description: Semantic code search and AI-powered codebase Q&A across indexed repositories. Use when understanding code beyond local files, exploring dependencies, discovering cross-project patterns, planning features, debugging, or onboarding. Queries like "How does X work?", "Show me Y patterns", "How is library Z used?". Provides search (fast, returns file locations and descriptions) and chat-with-codebase (slower, costs more, but returns synthesized answers).
description: Semantic code search and AI-powered codebase Q&A across indexed repositories. Use when understanding code beyond local files, exploring dependencies, discovering cross-project patterns, planning features, debugging, or onboarding. Queries like "How does X work?", "Show me Y patterns", "How is library Z used?". The default path is semantic search plus grep search; chat-with-codebase is slower, more expensive, and usually secondary.
---

# CodeAlive Context Engine
Expand Down Expand Up @@ -38,12 +38,15 @@ Do NOT retry the failed script until setup completes successfully.
| Tool | Script | Speed | Cost | Best For |
|------|--------|-------|------|----------|
| **List Data Sources** | `datasources.py` | Instant | Free | Discovering indexed repos and workspaces |
| **Search** | `search.py` | Fast | Low | Finding code locations, descriptions, identifiers |
| **Semantic Search** | `search.py` | Fast | Low | Finding relevant artifacts by meaning |
| **Grep Search** | `grep.py` | Fast | Low | Exact text and regex matches with line previews |
| **Fetch Artifacts** | `fetch.py` | Fast | Low | Retrieving full content for search results |
| **Artifact Relationships** | `relationships.py` | Fast | Low | Drilling into call graph, inheritance, references for one artifact |
| **Chat with Codebase** | `chat.py` | Slow | High | Synthesized answers, architectural explanations |

**Cost guidance:** Search is lightweight and should be the default starting point. Chat with Codebase invokes an LLM on the server side, making it significantly more expensive per call — use it when you need a synthesized, ready-to-use answer rather than raw search results.
**Cost guidance:** `semantic_search` and `grep_search` are the default starting point. Chat with Codebase invokes an LLM on the server side, can take up to 30 seconds, and is significantly more expensive per call — use it only when you need a synthesized, ready-to-use answer rather than raw search results.

**Highest-confidence guidance:** If your agent supports subagents and the task needs maximum reliability or depth, prefer a subagent-driven workflow that combines `search.py`, `grep.py`, `fetch.py`, `relationships.py`, and local file reads. `chat.py` is optional synthesis, not the default path.

**Three-step workflow (search → triage → load real content):**
1. **Search** — find relevant code locations with descriptions and identifiers
Expand Down Expand Up @@ -85,8 +88,9 @@ python scripts/datasources.py

```bash
python scripts/search.py "JWT token validation" my-backend
python scripts/search.py "error handling patterns" workspace:platform-team --mode deep
python scripts/search.py "authentication flow" my-repo --description-detail full
python scripts/search.py "authentication flow" my-repo --path src/auth --ext .py
python scripts/grep.py "AuthService" my-repo
python scripts/grep.py "auth\\(" my-repo --regex
```

### 3. Fetch full content (for external repos)
Expand All @@ -108,7 +112,7 @@ python scripts/relationships.py "my-org/backend::src/models.py::User" --profile
python scripts/relationships.py "my-org/backend::src/svc.py::Service" --profile allRelevant --max-count 200
```

### 5. Chat with codebase (slower, richer answers)
### 5. Chat with codebase (slower, optional synthesis)

```bash
python scripts/chat.py "Explain the authentication flow" my-backend
Expand All @@ -135,11 +139,9 @@ python scripts/search.py <query> <data_sources...> [options]

| Option | Description |
|--------|-------------|
| `--mode auto` | Default. Intelligent semantic search — use 80% of the time |
| `--mode fast` | Quick lexical search for known terms |
| `--mode deep` | Exhaustive search for complex cross-cutting queries. Resource-intensive |
| `--description-detail short` | Default. Brief description of each result |
| `--description-detail full` | More detailed description of each result |
| `--max-results N` | Optional cap for the number of returned artifacts |
| `--path PATH` | Repo-relative path or directory scope (repeatable) |
| `--ext EXT` | File extension scope such as `.py` or `.ts` (repeatable) |

**`description` is a triage pointer ONLY** — it tells you which artifacts are
worth a closer look. It is NOT the source of truth and you must NOT draw
Expand All @@ -148,6 +150,25 @@ source: use `fetch.py <identifier>` for external repos, or your editor's
file-read tool on the path for repos in the current working directory. Treat
only that real `content` as ground truth.

### `grep.py` — Exact / Regex Search

Returns artifact-level matches with line previews. Use this when the pattern
itself matters more than semantic similarity.

```bash
python scripts/grep.py <query> <data_sources...> [--regex] [--max-results N] [--path PATH] [--ext EXT]
```

| Option | Description |
|--------|-------------|
| `--regex` | Interpret the query as a regex pattern |
| `--max-results N` | Optional cap for the number of returned artifacts |
| `--path PATH` | Repo-relative path or directory scope (repeatable) |
| `--ext EXT` | File extension scope such as `.py` or `.ts` (repeatable) |

Line previews are still search evidence, not source of truth. Use `fetch.py`
or your local file-read tool before drawing conclusions about behavior.

### `fetch.py` — Fetch Artifact Content

Retrieves the full source code content for artifacts found via search. Use this for external repositories you cannot access locally.
Expand Down Expand Up @@ -192,7 +213,7 @@ python scripts/relationships.py <identifier> [--profile PROFILE] [--max-count N]

Sends your question to an AI consultant that has full context of the indexed codebase. Returns synthesized, ready-to-use answers. Supports conversation continuity for follow-ups.

**This is more expensive than search** because it runs an LLM inference on the server side. Prefer search when you just need to locate code. Use chat when you need explanations, comparisons, or architectural analysis.
**This is more expensive than search** because it runs an LLM inference on the server side. Prefer search when you just need to locate code. Use chat when you need explanations, comparisons, or architectural analysis after search. It can take up to 30 seconds.

```bash
python scripts/chat.py <question> <data_sources...> [options]
Expand Down Expand Up @@ -270,7 +291,7 @@ This skill works standalone, but delivers the best experience when combined with
| Component | What it provides |
|-----------|-----------------|
| **This skill** | Query patterns, workflow guidance, cost-aware tool selection |
| **MCP server** | Direct `codebase_search`, `fetch_artifacts`, `get_artifact_relationships`, `codebase_consultant`, `get_data_sources` tools |
| **MCP server** | Direct `semantic_search`, `grep_search`, `fetch_artifacts`, `get_artifact_relationships`, `chat`, `get_data_sources` tools plus deprecated aliases |

When both are installed, prefer the MCP server's tools for direct operations and this skill's scripts for guided workflows.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -298,7 +298,7 @@ Use when:
1. **Use natural language** - CodeAlive understands intent, not just keywords
2. **Be specific about context** - Include domain/layer info (API, database, frontend)
3. **Leverage workspaces** - Search across multiple repos for patterns
4. **Start with chat** - Ask "How does X work?" before searching
4. **Start with search** - Use semantic search first, then grep when the literal pattern matters; only use chat after you have evidence and still need synthesis
5. **Iterate** - Use follow-up questions to drill deeper
6. **Combine with local tools** - CodeAlive for discovery, Read for details
7. **Think like a librarian** - Focus on "what" and "why", not "where"
27 changes: 14 additions & 13 deletions skills/codealive-context-engine/references/workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,24 +28,24 @@ Review output to understand:
- What workspaces group related repos
- Which data sources to use for exploration

### Step 2: Get Architectural Overview
### Step 2: Understand Entry Points
```bash
python chat.py "Provide an architectural overview of this codebase. What are the main components, how do they interact, and what's the tech stack?" my-backend-repo
python search.py "main application entry point, startup initialization" my-backend-repo
```

### Step 3: Understand Entry Points
### Step 3: Explore Key Features
```bash
python search.py "main application entry point, startup initialization" my-backend-repo
python search.py "main features, core capabilities, major services" my-backend-repo
```

### Step 4: Explore Key Features
### Step 4: Get Architectural Overview Only If Needed
```bash
python chat.py "What are the main features/capabilities of this system?" my-backend-repo
python chat.py "Provide an architectural overview of this codebase. What are the main components, how do they interact, and what's the tech stack?" my-backend-repo
```

### Step 5: Understand Data Models
```bash
python search.py "database models, schemas, entity definitions" my-backend-repo --mode auto
python search.py "database models, schemas, entity definitions" my-backend-repo
```

**Progressive Discovery:**
Expand All @@ -61,18 +61,19 @@ python search.py "database models, schemas, entity definitions" my-backend-repo

### Example: Understanding User Authentication

#### Step 1: Start with High-Level Question
#### Step 1: Start with Search
```bash
python chat.py "How is user authentication implemented? Describe the flow from login to session management" my-backend
python search.py "user authentication, login flow, session management" my-backend
python grep.py "refresh token" my-backend
```

Save conversation_id for follow-up questions.

#### Step 2: Find Entry Points
#### Step 2: Use Chat Only If You Still Need Synthesis
```bash
python search.py "user login endpoint, authentication API" my-backend
python chat.py "How is user authentication implemented? Describe the flow from login to session management" my-backend
```

Save conversation_id for follow-up questions.

#### Step 3: Trace Through Layers
```bash
# API Layer
Expand Down
2 changes: 1 addition & 1 deletion skills/codealive-context-engine/scripts/fetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# Fetch multiple artifacts
python fetch.py "my-org/backend::src/auth.py::login" "my-org/backend::src/utils.py::helper"

Identifiers come from codebase_search results (the `identifier` field).
Identifiers come from semantic/grep search results (the `identifier` field).
The format is: {owner/repo}::{path}::{symbol} (for symbols/chunks)
{owner/repo}::{path} (for files)

Expand Down
115 changes: 115 additions & 0 deletions skills/codealive-context-engine/scripts/grep.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#!/usr/bin/env python3
"""
CodeAlive Grep Search - exact text or regex search across indexed repositories.

Usage:
python grep.py "AuthService" my-repo
python grep.py "auth\\(" my-repo --regex --max-results 25
python grep.py "TODO" workspace:backend-team --path src --ext .py
"""

import sys
from pathlib import Path

sys.path.insert(0, str(Path(__file__).parent / "lib"))

from api_client import CodeAliveClient


def format_grep_results(results: dict) -> str:
items = results.get("results", []) if isinstance(results, dict) else []
if not items:
return "No results found."

output = []
for idx, result in enumerate(items, 1):
location = result.get("location", {})
file_path = location.get("path") or result.get("path")
matches = result.get("matches", [])

output.append(f"\n--- Result #{idx} [{result.get('kind', 'Artifact')}] ---")
if file_path:
output.append(f" File: {file_path}")
if result.get("identifier"):
output.append(f" Identifier: {result['identifier']}")
Comment on lines +26 to +34
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The result formatting in grep.py is missing logic to check for filePath and to extract the file path from the identifier if explicit path fields are missing. This logic is present in search.py and should be included here for consistency and to ensure the file path is displayed whenever possible.

        location = result.get("location", {})
        file_path = location.get("path") or result.get("filePath") or result.get("path")
        identifier = result.get("identifier", "")
        matches = result.get("matches", [])

        if not file_path and identifier and "::" in identifier:
            parts = identifier.split("::")
            if len(parts) >= 2:
                file_path = parts[1]

        output.append(f"\n--- Result #{idx} [{result.get('kind', 'Artifact')}] ---")
        if file_path:
            output.append(f"  File: {file_path}")
        if identifier:
            output.append(f"  Identifier: {identifier}")

if result.get("matchCount") is not None:
output.append(f" Match count: {result['matchCount']}")

for match in matches:
output.append(
" "
f"{match.get('lineNumber', '?')}:{match.get('startColumn', '?')}-"
f"{match.get('endColumn', '?')} {match.get('lineText', '')}"
)

output.append(
"\nHint: match previews are search evidence only. Fetch the full source "
"with `python fetch.py <identifier>` or read the local file before reasoning about behavior."
)
return "\n".join(output)


def main():
if len(sys.argv) < 3:
print("Error: Missing required arguments.", file=sys.stderr)
print(
"Usage: python grep.py <query> <data_source> [data_source2...] "
"[--regex] [--max-results N] [--path PATH] [--ext EXT]",
file=sys.stderr,
)
sys.exit(1)

query = sys.argv[1]
data_sources = []
paths = []
extensions = []
max_results = None
regex = False

i = 2
while i < len(sys.argv):
arg = sys.argv[i]
if arg == "--regex":
regex = True
i += 1
elif arg == "--max-results" and i + 1 < len(sys.argv):
max_results = int(sys.argv[i + 1])
i += 2
Comment on lines +75 to +77
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The int() conversion for --max-results will raise a ValueError and cause the script to crash with a stack trace if a non-integer value is provided. It is better to handle this gracefully with a user-friendly error message.

Suggested change
elif arg == "--max-results" and i + 1 < len(sys.argv):
max_results = int(sys.argv[i + 1])
i += 2
elif arg == "--max-results" and i + 1 < len(sys.argv):
try:
max_results = int(sys.argv[i + 1])
except ValueError:
print(f"Error: --max-results must be an integer, got '{sys.argv[i + 1]}'", file=sys.stderr)
sys.exit(1)
i += 2

elif arg == "--path" and i + 1 < len(sys.argv):
paths.append(sys.argv[i + 1])
i += 2
elif arg == "--ext" and i + 1 < len(sys.argv):
extensions.append(sys.argv[i + 1])
i += 2
elif arg == "--help":
print(__doc__)
sys.exit(0)
else:
data_sources.append(arg)
i += 1
Comment on lines +87 to +89
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current argument parsing logic treats any unknown argument as a data source. This can lead to confusing behavior if a user makes a typo in a flag (e.g., --max-result instead of --max-results), as the typo will be added to the list of data sources and likely cause an API error later. It is safer to validate that unknown arguments do not start with --.

        elif arg.startswith("--"):
            print(f"Error: Unknown option '{arg}'", file=sys.stderr)
            sys.exit(1)
        else:
            data_sources.append(arg)
            i += 1


if not data_sources:
print(
"Error: At least one data source is required. Run datasources.py to see available sources.",
file=sys.stderr,
)
sys.exit(1)

try:
client = CodeAliveClient()
results = client.grep_search(
query=query,
data_sources=data_sources,
paths=paths or None,
extensions=extensions or None,
max_results=max_results,
regex=regex,
)
print(format_grep_results(results))
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
sys.exit(1)


if __name__ == "__main__":
main()
Loading