Skip to content

🦀 PinchBench

Real-world benchmarks for AI coding agents

PinchBench measures how well LLM models perform as the brain of an OpenClaw agent. Instead of synthetic tests, we throw real tasks at agents: scheduling meetings, writing code, triaging email, researching topics, and managing files.


Repositories

Repo Description
skill Benchmark runner and task definitions — run it yourself
leaderboard The pinchbench.com leaderboard frontend
api The public PinchBench API at api.pinchbench.com
scripts The offical PinchBench run automation with default_models.yml

Run the Benchmark

git clone https://github.com/pinchbench/skill.git
cd skill
./scripts/run.sh --model anthropic/claude-sonnet-4

Results upload to the public leaderboard. Get started →


Claw-some AI agent testing. Made with 🦀 by the humans at https://kilo.ai 🦞

Popular repositories Loading

  1. skill skill Public

    PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

    Python 691 53

  2. leaderboard leaderboard Public

    PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

    TypeScript 22 9

  3. api api Public

    Public API for pinchbench.com to display results. Also contains admin interface for managing PinchBench

    TypeScript 2 4

  4. .github .github Public

    PinchBench organization profile and community health files

  5. scripts scripts Public

    Shell 4

Repositories

Showing 5 of 5 repositories
  • scripts Public
    pinchbench/scripts’s past year of commit activity
    Shell 0 4 0 0 Updated Mar 20, 2026
  • skill Public

    PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

    pinchbench/skill’s past year of commit activity
    Python 691 MIT 53 13 4 Updated Mar 19, 2026
  • api Public

    Public API for pinchbench.com to display results. Also contains admin interface for managing PinchBench

    pinchbench/api’s past year of commit activity
    TypeScript 2 MIT 4 2 0 Updated Mar 19, 2026
  • leaderboard Public

    PinchBench is a benchmarking system for evaluating LLM models as OpenClaw coding agents. Made with 🦀 by the humans at https://kilo.ai

    pinchbench/leaderboard’s past year of commit activity
    TypeScript 22 9 8 1 Updated Mar 19, 2026
  • .github Public

    PinchBench organization profile and community health files

    pinchbench/.github’s past year of commit activity
    0 0 0 0 Updated Mar 17, 2026

Most used topics

Loading…