feat(evals): add skia category with 20 react-native-skia evals#377
Conversation
|
I'll take a look at the runner changes. @lech-kalinowski can you take a look at those opencode-related changes? |
There was a problem hiding this comment.
Pull request overview
Adds a new evals/skia category to expand the benchmark suite with 20 focused React Native Skia evaluations, and updates the runner’s OpenCode integration to be more robust (JSON extraction middleware + server reuse/config forwarding).
Changes:
- Add
evals/skiacategory README plus 20 evals (each with prompt, requirements, and reference implementation). - Update solver and judge LLM clients to use
extractJsonMiddlewareviawrapLanguageModelto handle fenced/embedded JSON. - Improve OpenCode server startup to reuse an existing server and forward
ANTHROPIC_API_KEYwhen spawning.
Reviewed changes
Copilot reviewed 64 out of 64 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| runner/utils/opencode.ts | Reuse existing OpenCode server on a port; add default port and provider config forwarding. |
| runner/solver/index.ts | Apply JSON-extraction middleware to solver model calls. |
| runner/evaluators/llm/judge-client.ts | Apply JSON-extraction middleware to judge model calls. |
| evals/skia/README.md | Category documentation: best-practice inventory, traceability matrix, and issue clusters. |
| evals/skia/01-rn-skia-canvas-fill-background/prompt.md | Eval 01 prompt. |
| evals/skia/01-rn-skia-canvas-fill-background/requirements.yaml | Eval 01 requirements. |
| evals/skia/01-rn-skia-canvas-fill-background/reference/App.tsx | Eval 01 reference implementation. |
| evals/skia/02-rn-skia-shape-primitives/prompt.md | Eval 02 prompt. |
| evals/skia/02-rn-skia-shape-primitives/requirements.yaml | Eval 02 requirements. |
| evals/skia/02-rn-skia-shape-primitives/reference/App.tsx | Eval 02 reference implementation. |
| evals/skia/03-rn-skia-path-drawing/prompt.md | Eval 03 prompt. |
| evals/skia/03-rn-skia-path-drawing/requirements.yaml | Eval 03 requirements. |
| evals/skia/03-rn-skia-path-drawing/reference/App.tsx | Eval 03 reference implementation. |
| evals/skia/04-rn-skia-paint-stroke-fill/prompt.md | Eval 04 prompt. |
| evals/skia/04-rn-skia-paint-stroke-fill/requirements.yaml | Eval 04 requirements. |
| evals/skia/04-rn-skia-paint-stroke-fill/reference/App.tsx | Eval 04 reference implementation. |
| evals/skia/05-rn-skia-linear-gradient/prompt.md | Eval 05 prompt. |
| evals/skia/05-rn-skia-linear-gradient/requirements.yaml | Eval 05 requirements. |
| evals/skia/05-rn-skia-linear-gradient/reference/App.tsx | Eval 05 reference implementation. |
| evals/skia/06-rn-skia-radial-gradient/prompt.md | Eval 06 prompt. |
| evals/skia/06-rn-skia-radial-gradient/requirements.yaml | Eval 06 requirements. |
| evals/skia/06-rn-skia-radial-gradient/reference/App.tsx | Eval 06 reference implementation. |
| evals/skia/07-rn-skia-image-display/prompt.md | Eval 07 prompt. |
| evals/skia/07-rn-skia-image-display/requirements.yaml | Eval 07 requirements. |
| evals/skia/07-rn-skia-image-display/reference/App.tsx | Eval 07 reference implementation. |
| evals/skia/08-rn-skia-text-rendering/prompt.md | Eval 08 prompt. |
| evals/skia/08-rn-skia-text-rendering/requirements.yaml | Eval 08 requirements. |
| evals/skia/08-rn-skia-text-rendering/reference/App.tsx | Eval 08 reference implementation. |
| evals/skia/09-rn-skia-blur-filter/prompt.md | Eval 09 prompt. |
| evals/skia/09-rn-skia-blur-filter/requirements.yaml | Eval 09 requirements. |
| evals/skia/09-rn-skia-blur-filter/reference/App.tsx | Eval 09 reference implementation. |
| evals/skia/10-rn-skia-color-matrix-filter/prompt.md | Eval 10 prompt. |
| evals/skia/10-rn-skia-color-matrix-filter/requirements.yaml | Eval 10 requirements. |
| evals/skia/10-rn-skia-color-matrix-filter/reference/App.tsx | Eval 10 reference implementation. |
| evals/skia/11-rn-skia-reanimated-basic-animation/prompt.md | Eval 11 prompt. |
| evals/skia/11-rn-skia-reanimated-basic-animation/requirements.yaml | Eval 11 requirements. |
| evals/skia/11-rn-skia-reanimated-basic-animation/reference/App.tsx | Eval 11 reference implementation. |
| evals/skia/12-rn-skia-derived-value-animation/prompt.md | Eval 12 prompt. |
| evals/skia/12-rn-skia-derived-value-animation/requirements.yaml | Eval 12 requirements. |
| evals/skia/12-rn-skia-derived-value-animation/reference/App.tsx | Eval 12 reference implementation. |
| evals/skia/13-rn-skia-animated-color-interpolation/prompt.md | Eval 13 prompt. |
| evals/skia/13-rn-skia-animated-color-interpolation/requirements.yaml | Eval 13 requirements. |
| evals/skia/13-rn-skia-animated-color-interpolation/reference/App.tsx | Eval 13 reference implementation. |
| evals/skia/14-rn-skia-gesture-pan/prompt.md | Eval 14 prompt. |
| evals/skia/14-rn-skia-gesture-pan/requirements.yaml | Eval 14 requirements. |
| evals/skia/14-rn-skia-gesture-pan/reference/App.tsx | Eval 14 reference implementation. |
| evals/skia/15-rn-skia-transforms/prompt.md | Eval 15 prompt. |
| evals/skia/15-rn-skia-transforms/requirements.yaml | Eval 15 requirements. |
| evals/skia/15-rn-skia-transforms/reference/App.tsx | Eval 15 reference implementation. |
| evals/skia/16-rn-skia-clip-rect-and-path/prompt.md | Eval 16 prompt. |
| evals/skia/16-rn-skia-clip-rect-and-path/requirements.yaml | Eval 16 requirements. |
| evals/skia/16-rn-skia-clip-rect-and-path/reference/App.tsx | Eval 16 reference implementation. |
| evals/skia/17-rn-skia-blend-mode/prompt.md | Eval 17 prompt. |
| evals/skia/17-rn-skia-blend-mode/requirements.yaml | Eval 17 requirements. |
| evals/skia/17-rn-skia-blend-mode/reference/App.tsx | Eval 17 reference implementation. |
| evals/skia/18-rn-skia-svg-path-rendering/prompt.md | Eval 18 prompt. |
| evals/skia/18-rn-skia-svg-path-rendering/requirements.yaml | Eval 18 requirements. |
| evals/skia/18-rn-skia-svg-path-rendering/reference/App.tsx | Eval 18 reference implementation. |
| evals/skia/19-rn-skia-runtime-effect-shader/prompt.md | Eval 19 prompt. |
| evals/skia/19-rn-skia-runtime-effect-shader/requirements.yaml | Eval 19 requirements. |
| evals/skia/19-rn-skia-runtime-effect-shader/reference/App.tsx | Eval 19 reference implementation. |
| evals/skia/20-rn-skia-canvas-snapshot/prompt.md | Eval 20 prompt. |
| evals/skia/20-rn-skia-canvas-snapshot/requirements.yaml | Eval 20 requirements. |
| evals/skia/20-rn-skia-canvas-snapshot/reference/App.tsx | Eval 20 reference implementation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ady up, deintegrate provider-specific code
32b7003 to
c44d24d
Compare
|
CC @wcandillon 👀 |
|
Maintainer of RN Skia here 👋 I pick 3 random prompts and reviewed them , it looked good! I might use the prompts and add them as e2e tests into RN Skia. That could also help identifying gaps is there are any. |
|
Thanks for taking the time @wcandillon ! Absolutely, feel free to use them - we'd be more than happy if you could tweet about whether and how the prompts worked afterwards - you could tag callstackio and artus9033. |
Summary
Adds a new
skiaeval category with 20 evaluations covering the core@shopify/react-native-skiaAPI surface.Each eval includes a focused prompt, atomic requirements, and a reference implementation. A category
README.mdwith a best-practice inventory and eval traceability matrix is also included.Evals added
canvas-fill-backgroundCanvas,Fill,useCanvasSizeshape-primitivesRect,Circle,RoundedRect,Linepath-drawingPath,Skia.Path.Make()paint-stroke-fillPaint, stroke vs filllinear-gradientLinearGradient,vecradial-gradientRadialGradientimage-displayImage,useImagetext-renderingText,matchFontblur-filterBlurcolor-matrix-filterColorMatrixreanimated-basic-animationuseSharedValue,withRepeat,withTimingderived-value-animationuseDerivedValueanimated-color-interpolationinterpolateColorsgesture-panGestureDetector,Gesture.Pantransformstransform,Groupclip-rect-and-pathClipRect,ClipPathblend-modeblendModesvg-path-renderingSkia.Path.MakeFromSVGStringruntime-effect-shaderSkia.RuntimeEffect.Make, GLSLcanvas-snapshotuseCanvasRef,makeImageSnapshotRunner fixes
extractJsonMiddlewarein both the solver and judge client to handle Claude wrapping JSON responses in markdown code fences (AI_NoObjectGeneratedError)ensureOpencodeServerStartedto reuse an already-running OpenCode server instead of attempting a duplicate startup, and forwardANTHROPIC_API_KEYto newly spawned serversBaseline scores
Solver and judge:
anthropic/claude-sonnet-4-5Average: ~88%