Skip to content

Conversation

@gakonst
Copy link
Contributor

@gakonst gakonst commented Feb 5, 2026

Summary

Adds an end-to-end eval that uses the Amp SDK to test whether AI agents can successfully build working code using Tempo docs.

What it does

  1. Prompts an Amp agent to: "Build a TypeScript CLI that transfers 0.01 pathUSD on Tempo testnet"
  2. Agent reads docs, writes the script, and executes it
  3. Test verifies the output transaction hash exists on-chain

Why

Per discussion in #product-docs - we're seeing agents (like Opus 4.5) get confused about:

  • Chain ID (using 12890 instead of 42431)
  • Token (using USDC instead of pathUSD)
  • Missing network details

This eval will help us iterate on docs until agents succeed consistently.

Files changed

  • e2e/agent-transfer-funds.test.ts - The eval test
  • package.json - Added @sourcegraph/amp-sdk dependency

Manual step needed

After merging, add this to .github/workflows/verify.yml to run the eval on schedule:

# Add to the "on:" section:
  schedule:
    - cron: '0 9 * * *'

# Add this job:
  agent-eval:
    name: Agent Docs Eval
    runs-on: ubuntu-latest
    timeout-minutes: 15
    if: github.event_name == 'workflow_dispatch' || github.event_name == 'schedule'
    permissions:
      contents: read
    steps:
      - uses: actions/checkout@v6
      - uses: actions/setup-node@v6
      - run: corepack enable pnpm
      - run: pnpm install
      - run: pnpm exec playwright install chromium --with-deps
      - run: pnpm exec playwright test agent-transfer-funds.test.ts
        env:
          AMP_API_KEY: ${{ secrets.AMP_API_KEY }}

Also add AMP_API_KEY to repository secrets.

- Uses Amp SDK to prompt an agent to build a TypeScript CLI
- Agent must use tempo.ts SDK to transfer pathUSD on testnet
- Verifies the output tx hash exists on-chain

Note: Workflow changes need to be added separately (see PR description)
Amp-Thread-ID: https://ampcode.com/threads/T-019c2e9b-8e68-703a-841f-92dc4d4910ef
Co-authored-by: Amp <amp@ampcode.com>
@vercel
Copy link

vercel bot commented Feb 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
tempo-docs Ready Ready Preview, Comment Feb 5, 2026 4:32pm

Request Review

- Import tempoModerato from 'viem/chains' (not tempo.ts/chains)
- Add testIgnore to playwright config to skip agent-*.test.ts unless AGENT_EVAL env is set
- Regular E2E tests now run without the agent eval interfering

Amp-Thread-ID: https://ampcode.com/threads/T-019c2e9b-8e68-703a-841f-92dc4d4910ef
Co-authored-by: Amp <amp@ampcode.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant