diff --git a/docs.json b/docs.json
index dc0e6e64..9f4442e8 100644
--- a/docs.json
+++ b/docs.json
@@ -185,7 +185,14 @@
"group": "Use Cases",
"pages": [
"openhands/usage/use-cases/vulnerability-remediation",
- "openhands/usage/use-cases/code-review",
+ {
+ "group": "Verification Stack",
+ "pages": [
+ "openhands/usage/use-cases/verification-stack",
+ "openhands/usage/use-cases/code-review",
+ "openhands/usage/use-cases/qa-changes"
+ ]
+ },
"openhands/usage/use-cases/incident-triage",
"openhands/usage/use-cases/cobol-modernization",
"openhands/usage/use-cases/dependency-upgrades",
diff --git a/openhands/usage/automations/event-automations.mdx b/openhands/usage/automations/event-automations.mdx
index c79171db..5425bd04 100644
--- a/openhands/usage/automations/event-automations.mdx
+++ b/openhands/usage/automations/event-automations.mdx
@@ -13,7 +13,7 @@ Event-based automations run when something happens—a PR is opened, an issue is
| Type | Setup | Best For |
|------|-------|----------|
-| **Built-in (GitHub)** | None—just create the automation | PR reviews, issue triage, push-triggered tasks |
+| **Built-in (GitHub)** | [One-time org setup](#prerequisites-for-github-event-automations), then create automations | PR reviews, issue triage, push-triggered tasks |
| **Custom Webhooks** | Register webhook first, then create automation | Linear, Stripe, Slack, and other services |
## GitHub Events (Built-In)
@@ -83,6 +83,85 @@ glob(repository.full_name, 'myorg/*') && contains(pull_request.labels[].name, 'b
---
+## Prerequisites for GitHub Event Automations
+
+GitHub event automations require some one-time setup before events will flow. If any step is missing, automations will appear to work (manual triggers succeed) but GitHub events will silently never arrive.
+
+### 1. Install the OpenHands GitHub App
+
+The OpenHands GitHub App must be installed on the GitHub organization that owns the repositories you want to monitor. Install it from your [GitHub integration settings](/openhands/usage/cloud/github-installation). The app needs access to the repositories that will generate events.
+
+### 2. Create an OpenHands Team Organization
+
+If you're working with repositories owned by a GitHub organization (e.g., `myorg/my-repo`), you need an OpenHands **team organization** — not just a personal account. GitHub events for org repos are routed to team orgs, not personal orgs.
+
+If you don't already have one, create a team organization from the [OpenHands Cloud settings](https://app.all-hands.dev/settings).
+
+### 3. Claim Your GitHub Organization
+
+
+**This is the most commonly missed step.** Without it, GitHub events have nowhere to be routed and will be silently dropped.
+
+
+Your OpenHands team org must **claim** the GitHub organization to establish the link between GitHub webhooks and your OpenHands org. Claiming tells the event router: _"Events for repos in this GitHub org should go to this OpenHands team org."_
+
+To claim a GitHub org:
+
+1. Switch to your team org using the org switcher in the sidebar
+2. Go to **Organization Settings**
+3. In the **Git Conversation Routing** section, find your GitHub org
+4. Click **Claim**
+
+You must have admin access to the GitHub org to complete the claim. See [Claiming Git Organizations](/openhands/usage/cloud/organizations/settings#claiming-git-organizations) for full details.
+
+### 4. Create the Automation Under the Team Org
+
+Make sure you are switched to the **team org** (not your personal org) when creating the automation. The automation must live in the same org that claimed the GitHub organization — otherwise events won't match.
+
+### 5. (Optional) Add Service Accounts to the Team Org
+
+If you're using a service account (like a bot account) to create or own automations, that account must be a **member of the team org**. Invite them from the [Organization Members](/openhands/usage/cloud/organizations/managing-members) page.
+
+### Example: Setting Up a PR Review Bot
+
+Here's a complete walkthrough for setting up an event-driven PR review automation:
+
+1. **Install the GitHub App** on your GitHub org with access to the target repo
+2. **Switch to your team org** in OpenHands Cloud
+3. **Claim the GitHub org** in Organization Settings → Git Conversation Routing
+4. **Create the automation**:
+
+```
+Create an event-based automation called "PR Review Bot" that triggers
+when a pull request is labeled with "review" in the myorg/my-repo repository.
+
+It should review the PR for code quality, potential bugs, and best practices,
+then post the review as a PR comment.
+```
+
+5. **Test it**: Open a PR in the target repo and add the trigger label. Check your automation runs to verify it was triggered.
+
+### Troubleshooting
+
+If your automation doesn't trigger on GitHub events:
+
+
+
+ The most common cause. Go to **Organization Settings → Git Conversation Routing** and check if your GitHub org shows as claimed. If not, click **Claim**. See [Claiming Git Organizations](/openhands/usage/cloud/organizations/settings#claiming-git-organizations).
+
+
+ GitHub events for org repos are routed to the **team org** that claimed the GitHub org. If you created the automation under your personal org, events will never reach it. Switch to the team org and recreate the automation.
+
+
+ Double-check that the event type (e.g., `pull_request.labeled`) and filter expression match the action you're testing. Use wildcards like `pull_request.*` to match all actions during debugging.
+
+
+ Verify the automation is enabled. You can check via the automations list or by asking OpenHands to list your automations.
+
+
+
+---
+
## Custom Webhooks
For services beyond GitHub—like Linear, Stripe, or Slack—register a custom webhook first, then create automations that use it.
diff --git a/openhands/usage/use-cases/code-review.mdx b/openhands/usage/use-cases/code-review.mdx
index 5251a977..7c8b6e8d 100644
--- a/openhands/usage/use-cases/code-review.mdx
+++ b/openhands/usage/use-cases/code-review.mdx
@@ -365,25 +365,33 @@ See real automated reviews in action on the OpenHands Software Agent SDK reposit
-## Automate This
+## OpenHands Automations (Beta)
-You can schedule daily code reviews using [OpenHands Automations](/openhands/usage/automations/overview).
-Copy this prompt into a new conversation to set one up:
+
+OpenHands Automations is currently an experimental beta feature. The API and configuration format may change.
+
-```
-Create an automation called "Daily Code Review" that runs every weekday at 9 AM.
+The alternative to GitHub Actions is [OpenHands Automations](/openhands/usage/automations/overview), our event-triggered automation system. With Automations, you define the trigger once and it covers all repositories the bot account has access to — no per-repo workflow files needed.
-It should:
-1. Find all open PRs that have no reviews yet
-2. For each PR, review the diff for bugs, style issues, and security concerns
-3. Post a summary of findings as a comment on each PR
+**Trade-offs vs GitHub Actions:** Simpler to set up and maintain across repos. Also leverages the full OpenHands runtime (browser, tools, sandbox), which GitHub Actions cannot. GitHub Actions gives more control over the execution environment and integrates directly with your CI pipeline.
-Learn more at https://docs.openhands.dev/openhands/usage/use-cases/code-review
-```
+
+
+ Log in to [OpenHands Cloud](https://app.all-hands.dev) with your organization's bot GitHub account.
+
-For inline review comments on every push, use the
-[pr-review plugin](https://github.com/OpenHands/extensions/tree/main/plugins/pr-review)
-as a GitHub Action instead.
+
+ Connect the bot account's GitHub to OpenHands Cloud via the [GitHub installation](/openhands/usage/cloud/github-installation) flow. Make sure to complete the [prerequisites](/openhands/usage/automations/event-automations#prerequisites-for-github-event-automations) — including installing the GitHub App and claiming your org.
+
+
+
+ Log in as the bot account and instruct the agent to set up the automation:
+
+ ```
+ Create an OpenHands Cloud automation using the Plugin Preset that automatically reviews pull requests. Use the pr-review plugin from github:OpenHands/extensions (repo_path: plugins/pr-review). Trigger it on pull_request.ready_for_review and pull_request.review_requested events for the repository YOUR_ORG/YOUR_REPO. Filter to trusted contributors only (exclude first-time contributors and unknown users) and only fire on review_requested when the requested reviewer is YOUR_BOT_LOGIN. Set timeout to 3600 seconds. Use this exact prompt for the automation: "Review the pull request from the GitHub event payload using the pr-review plugin. Post a comprehensive code review on GitHub with inline comments on specific changed lines where appropriate, and a concise overall summary. Avoid duplicating existing unresolved review comments. Include a brief note that the review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation."
+ ```
+
+
## Related Resources
diff --git a/openhands/usage/use-cases/overview.mdx b/openhands/usage/use-cases/overview.mdx
index 8229b463..6fcb41d5 100644
--- a/openhands/usage/use-cases/overview.mdx
+++ b/openhands/usage/use-cases/overview.mdx
@@ -8,6 +8,13 @@ OpenHands supports a wide variety of software development tasks. Here are some o
Each use case can be implemented in different ways—as a one-off conversation, a scheduled [automation](/openhands/usage/automations/overview), a [plugin](https://github.com/OpenHands/extensions), or through the [SDK](/sdk/index). Pick the approach that fits your workflow.
+
+ A layered system of automated verifiers — trajectory scoring, code review, and QA — that helps coding agents fail fast and produce changes you can trust.
+
Set up automated PR reviews to maintain code quality and catch bugs early.
+
+ Validate PR changes by actually running the software as a real user would.
+
-
+ Functionally test PR changes by exercising the software as a real user would.
+---
+
+
+ Check out the complete QA changes plugin with ready-to-use code and configuration.
+
+
+Automated QA testing goes beyond code review and CI: instead of reading diffs or running the test suite, the QA agent actually **runs the software** and verifies that changes work as claimed. It sets up the environment, exercises changed behavior as a real user would (browser, CLI, API requests), and posts a structured report with evidence.
+
+This is Layer 2 of the [Verification Stack](/openhands/usage/use-cases/verification-stack), complementing the [code review agent](/openhands/usage/use-cases/code-review).
+
+## Overview
+
+The QA agent follows a four-phase methodology:
+
+1. **Understand** — Reads the PR diff, title, and description. Classifies changes (new feature, bug fix, refactor, config) and identifies entry points (CLI commands, API endpoints, UI pages).
+2. **Setup** — Bootstraps the repository: installs dependencies, builds the project, notes CI status.
+3. **Exercise** — The core phase: spins up servers, opens browsers, runs CLI commands, makes HTTP requests — testing the changed behavior as a real user would. For bug fixes, it reproduces the bug on the base branch and verifies the fix on the PR branch.
+4. **Report** — Posts a structured QA report as a PR comment, with evidence (commands run, outputs, screenshots) and a verdict (PASS / FAIL / PARTIAL).
+
+The QA agent knows when to give up: if an approach fails after three materially different attempts, it switches strategy. If two fundamentally different strategies fail, it reports what it tried and stops — rather than spinning endlessly.
+
+## What It Does (and Doesn't)
+
+
+
+ - Run the actual application and interact with it
+ - Make real HTTP requests, run real CLI commands
+ - Open browsers and verify UI changes
+ - Reproduce bugs and verify fixes end-to-end
+ - Report with evidence (commands, outputs, screenshots)
+
+
+ - Run the test suite (that's CI's job)
+ - Analyze code for style or structure (that's code review's job)
+ - Run linters, formatters, or type checkers
+ - Substitute `--help` or `--dry-run` for real execution
+
+
+
+## Quick Start
+
+### GitHub Actions
+
+Create `.github/workflows/qa-changes.yml` in your repository:
+
+```yaml
+name: QA Changes
+
+on:
+ pull_request:
+ types: [opened, ready_for_review, labeled]
+
+permissions:
+ contents: read
+ pull-requests: write
+ issues: write
+
+jobs:
+ qa:
+ if: |
+ (github.event.action == 'opened' && github.event.pull_request.draft == false) ||
+ github.event.action == 'ready_for_review' ||
+ github.event.label.name == 'qa-this'
+ runs-on: ubuntu-latest
+ steps:
+ - name: Run QA Changes
+ uses: OpenHands/extensions/plugins/qa-changes@main
+ with:
+ llm-model: anthropic/claude-sonnet-4-5-20250929
+ llm-api-key: ${{ secrets.LLM_API_KEY }}
+ github-token: ${{ secrets.GITHUB_TOKEN }}
+```
+
+Add your `LLM_API_KEY` to your repository's **Settings → Secrets and variables → Actions**.
+
+### In a Conversation
+
+You can also trigger QA manually in any OpenHands conversation by invoking the skill:
+
+```
+/qa-changes
+```
+
+The agent will ask for the PR to test, or you can provide context directly:
+
+```
+/qa-changes — Please QA PR #42 on the my-org/my-repo repository.
+Focus on the new dashboard page and verify it renders correctly.
+```
+
+## QA Report Format
+
+The QA agent posts a structured report as a PR comment:
+
+```
+## QA Report
+
+**Status: PASS** ✅
+
+### Changes Tested
+- New `/api/health` endpoint returns 200 with version info
+- Dashboard page renders at `/dashboard` with correct data
+
+### Evidence
+1. Started server with `npm run dev`
+2. `curl http://localhost:3000/api/health` → 200 OK, body: {"status":"ok","version":"1.2.0"}
+3. Navigated to http://localhost:3000/dashboard — page renders correctly
+ [screenshot attached]
+
+### Edge Cases
+- Empty database state: dashboard shows "No data" placeholder ✅
+- Invalid auth token: returns 401 as expected ✅
+```
+
+## Customization
+
+### Change Types
+
+The QA agent adapts its approach based on the type of change:
+
+| Change Type | QA Approach |
+|-------------|-------------|
+| **Frontend / UI** | Starts dev server, opens browser, verifies visual changes, tests interactions |
+| **CLI** | Runs commands with realistic arguments, verifies output, tests edge cases |
+| **API / Backend** | Starts server, makes HTTP requests, verifies responses and side effects |
+| **Bug fix** | Reproduces bug on base branch, verifies fix on PR branch (before/after) |
+| **Library / SDK** | Writes and runs a short script that imports and calls changed functions |
+
+### Repository-Specific QA Guidelines
+
+Add repo-specific QA instructions by creating `.agents/skills/custom-qa-guide.md`:
+
+```markdown
+---
+name: custom-qa-guide
+description: Custom QA guidelines for this repository
+triggers:
+- /qa-changes
+---
+
+# QA Guidelines for [Your Project]
+
+## Environment Setup
+- Run `make setup` to initialize the development environment
+- The dev server runs on port 8080
+
+## Key Test Scenarios
+- Always verify the admin dashboard at /admin after backend changes
+- For API changes, test with both authenticated and unauthenticated requests
+
+## Known Limitations
+- The payment module requires a Stripe test key — skip payment flow testing
+```
+
+## Integration with the Verification Stack
+
+The QA agent is most powerful when used alongside the [code review agent](/openhands/usage/use-cases/code-review) and the [iterate skill](/openhands/usage/use-cases/verification-stack#closing-the-loop-the-iterate-skill) as part of the full [Verification Stack](/openhands/usage/use-cases/verification-stack):
+
+1. **Code review** catches issues by reading the diff (style, security, data structures)
+2. **QA** catches issues by running the software (behavioral regressions, UI bugs)
+3. **Iterate** orchestrates the loop — fixing issues flagged by either verifier and re-polling until the PR is clean
+
+## Troubleshooting
+
+
+
+ Ensure your repository's setup instructions are documented in `README.md` or `AGENTS.md`. The agent follows these to bootstrap the environment. If setup requires special steps, add them to a custom QA guide.
+
+
+
+ PARTIAL means some scenarios passed and others failed or couldn't be tested. Read the report details — it will explain what worked and what didn't. Common causes: missing environment variables, external service dependencies, or insufficient permissions.
+
+
+
+ For large PRs with many changed entry points, the agent may need more time. Consider splitting large PRs into smaller, focused changes. You can also add a custom QA guide that prioritizes the most important scenarios.
+
+
+
+## Related Resources
+
+- [QA Changes Plugin](https://github.com/OpenHands/extensions/tree/main/plugins/qa-changes) — GitHub Actions plugin
+- [QA Changes Skill](https://github.com/OpenHands/extensions/tree/main/skills/qa-changes) — Detailed skill methodology
+- [Verification Stack](/openhands/usage/use-cases/verification-stack) — How QA fits into the full verification pipeline
+- [Automated Code Review](/openhands/usage/use-cases/code-review) — The complementary code review agent
diff --git a/openhands/usage/use-cases/verification-stack.mdx b/openhands/usage/use-cases/verification-stack.mdx
new file mode 100644
index 00000000..83f79d00
--- /dev/null
+++ b/openhands/usage/use-cases/verification-stack.mdx
@@ -0,0 +1,77 @@
+---
+title: The Verification Stack
+description: A layered system of automated verifiers that helps coding agents fail fast and produce changes you can trust
+---
+
+LLMs made generating code cheap. The real bottleneck is **verification**: checking that a change is correct, follows repo conventions, and is something you'd actually merge. The verification stack is a layered set of automated verifiers designed to catch different kinds of mistakes at different stages — so problems are caught early, cheaply, and without human intervention.
+
+## Architecture
+
+The verification stack consists of two layers today, with an architecture designed to support more:
+
+
+
+**Layer 1 — Trajectory-Level Verifier (Critic Model):** A small, fast critic model that scores the agent's trajectory *before* code is pushed. If the score falls below a confidence threshold, the agent's work is gated — preventing obviously broken or off-track changes from ever reaching a pull request. See [Enabling Layer 1](#enabling-layer-1-trajectory-level-verifier) below.
+
+**Layer 2 — Repo-Level Verifier (ReviewBot):** An automated code reviewer and QA agent that triggers on every pull request. It reviews the diff for correctness, security, and style, then optionally runs the software to verify behavior. See [Enabling Layer 2](#enabling-layer-2-repo-level-verifier) below.
+
+Together, these layers form a pipeline: the critic prevents obviously broken work from being pushed, and the ReviewBot catches the subtler issues that require repository context.
+
+## How Effective Is It?
+
+We've been running the verification stack on the [OpenHands/software-agent-sdk](https://github.com/OpenHands/software-agent-sdk) repository for several months. Key findings:
+
+- **Faster approvals** — As ReviewBot adoption increased, time to first approval dropped significantly, with the largest gains on medium-to-large PRs.
+- **Improving accuracy** — Bot review precision and recall have improved consistently over time. Human reviewers are generally more precise, but the bot catches issues humans miss — the two are complementary.
+- **Code quality maintained** — Static analysis (radon, bandit, ruff) shows no degradation in cyclomatic complexity, security violations, or code smells for bot-reviewed PRs compared to human-only PRs.
+- **Test coverage improving** — Since the ReviewBot was introduced, test coverage across the repository has trended upward.
+- **Review rounds decreasing** — PRs with ReviewBot initially required more review rounds, but that gap has been closing as the skill improves.
+
+For detailed metrics and methodology, see our blog post: [The Verification Stack](https://www.openhands.dev/blog/verification-stack).
+
+## Enabling Layer 1: Trajectory-Level Verifier
+
+
+The trajectory-level verifier integration is under active development. Configuration instructions will be added here once finalized.
+
+
+The trajectory-level verifier uses a critic model to evaluate agent work *before* code is pushed. It is currently available through:
+
+- **OpenHands CLI** — The critic is automatically enabled when using the [OpenHands LLM Provider](/openhands/usage/llms/openhands-llms). See the [Critic documentation](/openhands/usage/cli/critic) for configuration options including iterative refinement thresholds.
+- **Software Agent SDK** — Programmatic access via the [SDK Critic Guide](/sdk/guides/critic).
+
+The critic provides quality scores between 0.0 and 1.0, real-time feedback during agent execution, and automatic iterative refinement when it predicts incomplete work.
+
+For technical details on how the critic model works, see our paper: [A Rubric-Supervised Critic from Sparse Real-World Outcomes](https://arxiv.org/abs/2603.03800).
+
+## Enabling Layer 2: Repo-Level Verifier
+
+The repo-level verifier consists of two components — a [code review agent](/openhands/usage/use-cases/code-review) and a [QA agent](/openhands/usage/use-cases/qa-changes) — both available as plugins in the [OpenHands/extensions](https://github.com/OpenHands/extensions) repository.
+
+For full setup instructions, see:
+
+- **[Automated Code Review](/openhands/usage/use-cases/code-review)** — GitHub Actions and OpenHands Automations setup for the ReviewBot
+- **[Automated QA Testing](/openhands/usage/use-cases/qa-changes)** — Setup for the QA agent
+
+## Closing the Loop: The Iterate Skill
+
+Setting up the verification stack is only half the story. The other half is *acting on it* — reading CI results, parsing review comments, fixing code, pushing again, and repeating until everything is green.
+
+The **iterate** skill ([OpenHands/extensions](https://github.com/OpenHands/extensions)) turns the agent into an orchestration loop that drives a pull request from first push to merge-ready:
+
+1. **Push and open a draft PR** — the PR starts as a draft to prevent premature automation triggers.
+2. **Poll each verification layer** — the agent checks CI, the ReviewBot's verdict, and the QA agent's report. It only polls layers that actually exist in the repo.
+3. **Decide and act** — if CI failed, it reads the logs and fixes the code. If the ReviewBot requested changes, it addresses the inline comments. If QA found regressions, it debugs and fixes.
+4. **Push and re-poll** — after every fix, the agent commits, pushes, re-requests review, and loops back. A push is never the end — the loop only exits when all present layers pass on the current SHA.
+5. **Mark ready** — once every verification layer is green, the agent converts the draft PR to ready for review.
+
+Without the iterate skill, the verification stack is a set of independent checks. *With* it, the stack becomes a closed-loop system where the human reviewer only sees the PR after automated layers have converged on a clean state. Invoke it with `/iterate` in any OpenHands conversation with the skill loaded.
+
+## Related Resources
+
+- [Automated Code Review](/openhands/usage/use-cases/code-review) — Detailed setup for the code review agent (Layer 2)
+- [QA Changes](/openhands/usage/use-cases/qa-changes) — Detailed setup for the QA agent (Layer 2)
+- [Critic Documentation](/openhands/usage/cli/critic) — Trajectory-level verifier configuration (Layer 1)
+- [OpenHands Automations](/openhands/usage/automations/overview) — Event-triggered automation setup
+- [The Verification Stack (blog post)](https://www.openhands.dev/blog/verification-stack) — Detailed effectiveness metrics and analysis
+- [Learning to Verify AI-Generated Code (blog post)](https://www.openhands.dev/blog/20260305-learning-to-verify-ai-generated-code) — Layer 1 deep dive