Trace Observability Batch PR by mmabrouk · Pull Request #4443 · Agenta-AI/agenta

mmabrouk · 2026-05-26T15:32:10Z

Context in merged (approved) PRs.

Render ag.data.parameters as a collapsed, beautified panel in the trace overview, alongside inputs and outputs, shown only when parameters exist. Adds a defaultCollapsed prop to TraceSpanDrillInView and AccordionTreePanel.

Collapse the trace tree to the spans that carry signal (LLM, tool, agent, retrieval, output parsers, and errors) and hide structural wrappers such as RunnableSequence and RunnableLambda. Key spans is the default view, with an "All spans" option in the tree settings popover and a notice showing how many spans were hidden. The filter rules live in spanVisibility.ts as editable span-type and name-pattern definitions, so the definition of a key span can change without touching the traversal logic. Also fixes a pre-existing layout bug where the trace drawer splitter dividers stopped short of the bottom on tall screens.

The playground collapsed each user turn to a single assistant message and picked the last one. A turn that calls a tool, gets a result, then replies has two assistants. The tool-calling assistant was dropped along with its tool_calls, while the tool message kept its tool_call_id. The replayed request then sent a tool message with no preceding assistant carrying tool_calls, which OpenAI rejects. The loader now keeps every assistant and tool message in the turn, in source order, and carries tool_calls, tool_call_id, and name through unchanged. The request history is built from this flat ordered list, so replay now produces a valid assistant-then-tool pairing. The old hack that stuffed tool-call args into assistant content is removed since tool_calls are now preserved structurally. UI rendering still shows one assistant per turn (the derived-turn model is single-assistant); that is a separate display follow-up and does not affect replay.

…ervability table Before this change, the Inputs and Outputs cells in the observability table rendered chat conversations as chat and everything else as raw JSON. A LangChain LLMResult, an agent's returnValues.output, or any custom tool call all collapsed into one unreadable blob. You had to open the drawer to see anything useful. After this change, the cells dispatch three ways: chat, beautified extraction, or raw JSON. Beautified extraction is a new layer in the middle. A small registry of shape-based rules pulls a useful subset out of recognized payloads and renders just that subset. Three rules ship: - input-key: {input: ...} -> {input: ...} - output-key: {returnValues: {output: ...}} -> {output: ...} - generations-output: {generations: [[{text}, ...], ...]} -> {output: text} for one text, {outputs: [...]} for many The chat detector keeps its prefer hint for payloads that carry both input and output keys in one bag (see edge case 2 in the original chat-extraction-algorithm doc). Without the hint, the output cell can surface input chat. All three rules and the dispatcher live in @agenta/ui/cell-renderers next to extractChatMessages. SmartCellContent and LastInputMessageCell delegate the renderer decision to a new extractPreview dispatcher that returns a discriminated union of {renderer, data, source}. The annotation session list view inherits the new behavior automatically because it already uses SmartCellContent. Design notes in docs/designs/observability-cell-preview/rfc.md. Deferred for follow-ups: user-configurable rules, span-aware heuristics (rules that read span_type or attributes), and a parallel test file for the new rules.

- Re-wire SmartCellContent.beautifyJson for the JSON fallback path so callers that explicitly opt into beautified rendering keep that behavior (e.g. annotation scenario list). - Add empty-array guard in LastInputMessageCell. The chat rule does not return empty arrays today, but the Preview type allows it; the guard defends against future rule changes. - Refactor extractPreview rules to return Preview | null directly. The dispatcher becomes a simple walk with no per-kind branching. - Update RFC to match the shipped implementation: status is Implemented, the three actual rules (input-key, output-key, generations-output) replace the speculative ones, and the side-hint decision is documented.

…a on ambiguous requests Symptom: opening an n8n trace span in the playground navigated to the Exact Match evaluator's playground instead of the n8n app. The request sent was `{workflow_ref: {slug: "n8n"}, workflow_revision_ref: {version: "1"}}`, and the endpoint returned an unrelated revision with `slug: "exact-match"` and `version: "0"`. No error, no warning. Root cause: two layers conspire. Service: `fetch_workflow_revision` only resolves `workflow_ref` to a variant when no `workflow_revision_ref` is supplied. With our request shape the resolution branch was skipped and the DAO was called with `variant_ref=None`. DAO: `fetch_revision` has no `artifact_ref` parameter. When given a `revision_ref` with only `version` and no `variant_ref`, every filter branch is skipped. The query reduces to `WHERE project_id = ... LIMIT 1` and returns whichever row Postgres happens to surface first. Result: a `version`-only request would silently return arbitrary data. A `(workflow_ref, version)` request would do the same because the workflow_ref never reached the DAO. This PR adds guardrails. It does not change what the endpoint can answer — ambiguous shapes that used to silently return wrong data now return HTTP 400 with a clear message. Changes: - DAO `fetch_revision`: track whether any identifying filter was applied and return `None` if not. Defense in depth against the unfiltered `LIMIT 1` trap. - Service `fetch_workflow_revision`: reject requests where `workflow_revision_ref` carries only `version` (no `id`, no `slug`) and no `workflow_variant_ref` is provided. Versions are per-variant sequence numbers and cannot identify a revision on their own. `revision_ref.id` and `revision_ref.slug` are both project-unique and remain valid alone. - New domain exception `WorkflowRefInvalid` in `core/workflows/dtos.py`, raised by the service. The three router call sites catch it and convert to HTTPException 400 with the exception message. Matches the existing pattern used for `VariantForkError`. - Updated the `WorkflowRevisionRetrieveRequest` field docstrings on `workflow_revision_ref` (and the others) to match actual behavior. The previous text said "by `id`, or by `slug` + `version`" which misrepresented both: `slug` alone is sufficient, and `version` alone is not. Existing callers audited: Internal: `core/embeds/utils.py`, `core/applications/service.py`, `core/evaluators/service.py`, `core/evaluations/tasks/legacy.py`, `core/workflows/service.py` (internal), and the three router call sites all send either: (a) a safe shape (id-based, or `(workflow_ref + variant_ref + revision_ref.version)`), or (b) a pass-through of caller-supplied refs from public endpoints. The public endpoints' documented usage in `docs/docs/prompt-engineering/integrating-prompts/02-fetch-prompt-programatically.mdx` shows the full three-ref shape, so existing user code that follows the docs is unaffected. User code that was sending the ambiguous shape was already getting wrong data — it now gets a 400 it can notice. Frontend: only one caller of `/workflows/revisions/retrieve` exists in `web/`, in `playgroundController.ts`, and a separate PR will normalize that caller to never send the ambiguous shape. Audit document: `docs/design/playground-open-from-trace/retrieve-endpoint-audit.md` walks the full data model, the input matrix, all six bugs/silent behaviors, what is technically possible at the data layer, and the multi-PR plan this is the first PR of. Not in this PR (separate tickets): - Multi-variant disambiguation when `workflow_ref` is provided alone and the artifact has more than one variant (deterministic ordering). - Silent drop of `workflow_ref` when `revision_ref.id` is provided (validate they match, or honor the ref as an additional filter). - DAO refactor to accept `artifact_ref` directly so the service no longer translates workflow refs into variant refs by hand. - Frontend PR 2: normalize the playground's resolver to never send the now-rejected shape.

Symptom: a trace emitted with `ag.refs.application_variant.slug = "n8n.default"` (the canonical `<app_slug>.<variant_name>` form Agenta itself produces) was stored with the slug silently nulled. The variant reference collapsed to `{}`, the trace was not linked to the variant, and "open in playground" did not work. Cause: `initialize_ag_attributes` in the OTLP ingest cleaner validated reference slugs against `^[a-zA-Z0-9_-]+$`, stricter than the SDK's `Slug` validator (`^[a-zA-Z0-9_.\-]+$`) and stricter than the DB column (plain `character varying`, no constraint). Anything containing a dot failed the regex and was silently set to None. The entry was kept but its slug was wiped. Fix: - Lift the SDK's regex into a named constant `URL_SAFE_SLUG` in `sdks/python/agenta/sdk/models/shared.py` so SDK and API share one source of truth and cannot drift again. Tightened it to `^[a-zA-Z0-9_\-][a-zA-Z0-9_.\-]*$` so dotted slugs like `n8n.default` are allowed, but path-traversal-shaped slugs (exactly `.`, exactly `..`, or any leading-dot form like `.hidden`) are rejected. - Remove the local `URL_SAFE` constant in the tracing cleaner and import the shared `URL_SAFE_SLUG` from the SDK. - Add two regression tests in `api/oss/tests/pytest/unit/tracing/utils/test_attributes.py`: - dotted slugs (`n8n.default`) are preserved on the cleaned reference; - unsafe slugs (`a/b`, `has space`, `with?query`) and leading-dot slugs (`.`, `..`, `.hidden`) are still nulled. The existing "bad slug" case (`"bad slug"` with a space) continues to assert the entry collapses to `{}`.

…ils-extension-1 [fix] Extend guardrails to all git-based entities

Symptom: invoking a chat or completion app from the playground produced traces with ag.meta.configuration = {} on the root span, even though the invoke payload carried a full parameters object (prompt, llm_config, etc). The handler still received the parameters and the call ran correctly. Only the trace was empty. Cause: the tracing decorator at decorators/tracing.py:393 reads TracingContext.parameters when setting the root span's configuration attribute. That field exists on TracingContext but no code in the SDK ever assigned to it. The grep was empty across the repo. So the root span always recorded the fallback {}. Fix: have ResolverMiddleware mirror the resolved parameters onto the tracing context after it merges request and revision values. The resolver is the single funnel where both paths converge (parameters supplied in the invoke payload, or parameters fetched from a revision via references) and it runs after @ag.embed expansion. Whatever the handler is about to receive is what the trace will record. Adds three regression tests in TestResolverMiddlewareTracingParameters covering each path. Also wraps the existing TestResolverMiddlewareEmbedGate tests in tracing_context_manager to match how workflow.invoke runs the middleware in production. Without the wrapper the mutation lands on a throwaway default context and silently passes.

Add id DESC as secondary ordering in fetch_revision's latest-pick branch so the chosen revision is deterministic when two revisions on the same variant share created_at. Also fix the pre-commit ruff hook: invoke the `ruff` binary directly instead of `python3 -m ruff`, which fails on systems where the system Python doesn't ship the ruff module (e.g. Homebrew Python 3.14 on macOS). Refs: docs/design/playground-open-from-trace/followups.md C6 / D7

Add the missing slug branch in GitDAO.fetch_variant: when artifact_ref carries only slug (no id), join the artifact table and filter by it. Mirrors how fetch_revision handles variant_ref.slug. Also add the applied_identifying_filter bailout (parallel to fetch_revision's existing one) so an unidentified ref falls cleanly through to None instead of running an unfiltered LIMIT 1. Refs: docs/design/playground-open-from-trace/followups.md C1.1 / D2.1

When fetch_variant is called with artifact_ref (no variant_ref), the DAO previously did LIMIT 1 with no ORDER BY, so the picked variant was whatever Postgres surfaced first — non-deterministic for multi-variant artifacts. Add ORDER BY created_at ASC, id ASC so the pick is stable. First-created variant wins; ties broken by id. Interim of C1b (is_default flag) which will replace this with an explicit-flag lookup. Refs: docs/design/playground-open-from-trace/followups.md C1a / D2

Acceptance tests pinning rules 2.a (unique/minimal/sufficient), 2.b (redundant/consistent), and 2.d (latest-revision and default-variant fallbacks) across applications, evaluators, queries, testsets, environments, and workflows. Each entity gets six tests covering: - revision id-based lookup - revision slug-based lookup - variant + version lookup - variant-only lookup (latest revision fallback) - artifact-only lookup (default-variant + latest fallback) - redundant-consistent triple lookup Catches regressions in C1a's deterministic default-variant pick and C1.1's slug-filter fix, plus any future shift in resolution behavior. Refs: docs/design/playground-open-from-trace/followups.md C8 (first pass)

…ariant Relax validate_revision_ref_unambiguous so {revision_ref:{version}} is accepted when either variant_ref OR artifact_ref is identifying. Either is sufficient scope: variant directly, artifact via the deterministic default-variant fallback (post-C1a). Add needs_default_variant_resolution helper so each entity service can detect the {artifact_ref + version} shape (and the existing {artifact_ref} alone shape) with one expression. Apply across workflows, testsets, queries, and environments services to broaden their existing resolution branches. Applications and evaluators inherit via delegation to workflows_service. Rename _has_id_or_slug to _is_identifying — captures the semantic concept instead of the implementation. Tests updated: - test_revision_retrieve_ambiguous_400.py: drop the 6 stale negative tests asserting {artifact + version} returns 400 (it no longer does). - test_revision_retrieve_positive.py: add 6 positive tests asserting {artifact + version} returns the right revision for each entity. - test_revision_ref_validation.py: drop the "still raises" unit test, add 2 new "artifact scopes version" passing cases. Followups.md updated for the _has_id_or_slug rename. Refs: docs/design/playground-open-from-trace/followups.md C2 / D3

Rename RevisionRefInvalid -> RetrieveRefsInsufficient; add new sibling RetrieveRefsInconsistent for redundant refs that disagree with the resolved revision. The new validate_retrieve_refs_consistent helper compares every identifying field (id, slug, version) on caller-supplied artifact/variant/revision refs against the resolved Revision's corresponding fields (artifact_id, artifact_slug, variant_id, variant_slug, id, slug, version). On mismatch raise RetrieveRefsInconsistent; the routers translate it to HTTP 400. Wired into fetch_*_revision in workflows, testsets, queries, and environments. Applications and evaluators inherit via delegation to workflows_service. Each service snapshots the caller-supplied refs before default-variant resolution mutates them locally so the consistency check sees what the caller actually sent. Tests: - 22 new unit tests for the consistency helper - 6 acceptance tests (one per entity) asserting inconsistent {artifact_ref.slug + revision_ref.id} returns 400 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…round Opening a trace span in the playground only resolved when app.slug was set without app.id, sent a hand-picked subset of refs, and overwrote existing ids with the resolver's result. Spans that carried only variant.slug, only rev.id, only rev.slug, or app.id without a revision either silently opened the wrong app, dropped to a useless ephemeral playground, or navigated to /apps/{id}/playground with no ?revisions= (the page does not auto-pick a default revision). This change broadens the resolver to fire whenever the trace gives us any identifying ref the backend can scope a lookup from (any of app/variant/revision x id/slug, or version when variant is also present). The resolver sends every identifier the trace carries and only backfills the ids that are missing - refs the trace supplied are authoritative and never overwritten. The appDefault branch is removed; under the new rule a successful resolve always returns both app.id and revision.id together, so the URL we navigate to always carries ?revisions=. Specifics: - New resolveTraceRefs(refs, projectId) in playgroundController.ts. Picks the smallest identifying ref per level (id ?? slug ?? version), caches successful results with a 5-minute TTL, falls back to ephemeral on null/error. - openFromTraceAtom skips the round-trip when app.id and revision.id are both already on the refs. - retrieveWorkflowRevision accepts an optional workflowRef so callers can resolve from variant-only or revision-only refs. - hasAppReference in TraceTypeHeader now requires id or slug on the ref (matching the resolver's gate) so the button enable predicate and the resolver predicate agree on bare-version refs. - Drops OpenFromTraceResult.appDefault from the union and the navigation branch that depended on it. Verified against the live /workflows/revisions/retrieve endpoint with a probe that exercises 19 input combinations (matrix in docs/design/playground-open-from-trace/plan.md). Every combination with at least one id or slug resolves to the expected revision; bare-version requests are correctly rejected by the backend (PR #4418 guard) and fall through to ephemeral.

`validate_retrieve_refs_consistent` previously took seven flattened kwargs (resolved_artifact_id, _slug, resolved_variant_id, _slug, resolved_revision_id, _slug, _version). Collapse them into three grouped `Reference` kwargs (`resolved_artifact_ref`, `resolved_variant_ref`, `resolved_revision_ref`) so the resolved side mirrors the caller side of the signature. All four service call sites and the unit tests are updated accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Variants have no `version` field — version is a per-variant counter on revisions. A `variant_ref` populated with only `version` is nonsense the DAO would silently drop; reject at the boundary. - Rename validate_revision_ref_unambiguous -> validate_revision_refs_sufficient so the validator name lines up with the RetrieveRefsInsufficient exception it raises. - Add sibling validate_variant_refs_sufficient with matching contract. - Wire variant-side validation into fetch_*_revision AND fetch_*_variant across workflows/applications/evaluators/testsets/queries/environments. - Rename test_revision_ref_validation -> test_retrieve_refs_sufficiency to match new naming; extend it with a TestVariantVersionOnlyRejection class and a TestVariantSufficientShapesPass class. - Add an acceptance test per entity asserting the 400. Env-revision-version is already covered by validate_revision_refs_sufficient because the env service treats environment_ref as artifact_ref — no separate env mirror needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three routers (applications, workflows, evaluators) host the same env-path retrieve policy with subtly different rules. Bring them in sync: - Drop the workflows/evaluators path-mixing rejection. Mixing entity refs and env refs is now allowed at the policy boundary; C3's consistency check rejects contradictions downstream (D10). - Adopt applications' key-derivation pattern in workflows and evaluators: when env refs are supplied without `key`, derive `key = f"{artifact_slug}.revision"` from artifact_ref.slug, or by fetching the artifact when only artifact_ref.id is present (D11). Kept inline per router. The actual helper extraction (this layer AND the six-entity service-layer pipeline AND the C10 exception-translation boilerplate) is bundled as a deferred C5b item in followups.md — they all touch the same per-entity duplicates and want one combined refactor. Tests: - Acceptance: pin the aligned behavior for all three routers (empty request -> 400, env-ref-without-key -> 400, path-mixing passes policy boundary). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…h-mixing Add acceptance tests for the three env-supporting routers (workflows, applications, evaluators) that pin behavior unchanged by C5 but not covered by C8's first pass: * 2.a env-path: {env_ref + key} resolves the deployed revision. * 2.d env-path: {env_ref + artifact_ref} (no key) derives the default key {artifact_slug}.revision. * 2.b path-mixed: {env_ref + key + artifact_ref (matching)} is accepted (redundant-consistent). * 2.c path-mixed: {env_ref + key + revision_ref pointing elsewhere} is rejected with 400 by the post-resolution consistency check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace 12 inline `try/except (RetrieveRefsInsufficient, RetrieveRefsInconsistent)` and `except VariantForkError` blocks across six routers (workflows, applications, evaluators, testsets, queries, environments) with a single `@handle_git_exceptions()` decorator at each route. Introduces `apis/fastapi/git/exceptions.py`: * `VariantForkErrorException`, `RetrieveRefsInsufficientException`, `RetrieveRefsInconsistentException` — typed `HTTPException(400)` subclasses mirroring the folders/ pattern. * `handle_git_exceptions()` — translates each core exception to its typed counterpart. Inserted innermost in the decorator stack so it runs before `@suppress_exceptions(exclude=[HTTPException])` and the typed exception propagates. Any new git-domain exception added to `core/git/types.py` now needs a single registration in this module rather than N call-site edits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Bring user-facing prose in line with the post-C2/C3/C4/C5 behavior: * docs/docs/reference/api-guide/04-versioning.mdx — replace "the most specific wins" with the new contract that every supplied reference must agree, and enumerate the 400 cases (revision-version-only, variant-version-only, contradictions). * docs/docs/prompt-engineering/integrating-prompts/02-fetch-prompt- programatically.mdx — add a "How the request resolves" section listing valid identifying shapes and the 400 cases. * api/oss/src/apis/fastapi/{applications,workflows,evaluators} /models.py — replace "arbitrary variant" wording in *_ref descriptions (post-C1a the default variant is deterministic) and add a top-level docstring noting the consistency contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-26T17:55:05Z

Railway Preview Environment


Preview URL	https://gateway-production-2e9a.up.railway.app/w
Project	`agenta-oss-pr-4443`
Image tag	`pr-4443-0c39d83`
Status	Deployed
Railway logs	Open logs
Workflow logs	View workflow run
Updated at 2026-05-27T15:59:44.163Z

… naming - `hasAppReference` now requires `id`/`slug` to be a non-empty string, matching `asString` semantics in the playground resolver. Spans carrying `{id: "", slug: ""}` previously enabled the Playground button only to fall through to ephemeral after the resolver gated them out. - Rename `onlyRevVersion` → `hasNoIdentifyingRef` in both the resolver and `retrieveWorkflowRevision` — the variable was true whenever no identifying ref existed at any level, not specifically when only a revision `version` was supplied. Comments updated to match.

…and-missing-checks-in-guardrails

…github.com:Agenta-AI/agenta into fix/resolve-tests-and-missing-checks-in-guardrails

…-checks-in-guardrails [fix] Resolve tests and missing checks in guardrails

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

api/oss/src/core/environments/service.py (1)

619-636: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Treat empty Reference({}) values as no input before the DAO call.

Reference objects are truthy even when id, slug, and version are all None. With the current guard, {} skips the early return, passes these validators, and can still reach fetch_revision() with no identifying scope. That turns an empty retrieve into an under-scoped DAO read instead of None or a 400. Normalize empty refs to None first, or base this guard on is_identifying(...) plus explicit version presence for revision refs.

api/oss/src/core/queries/service.py (1)

656-668: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Empty Reference({}) requests still fall through to fetch_revision().

The early return uses bool(ref), but Reference(id=None, slug=None, version=None) is still truthy. A body like {"query_ref": {}} will bypass this guard, survive both validations, and hit the DAO with no real identifying scope. Normalize empty refs to None, or switch this guard to is_identifying(...) with an explicit revision_ref.version check.

🧹 Nitpick comments (1)

api/oss/tests/pytest/acceptance/test_revision_retrieve_positive.py (1)

1215-1231: ⚡ Quick win

This assertion still passes if the router ignores environment_ref.

Because the test accepts any non-400/non-5xx response, a regression that bypasses env-backed lookup and resolves only workflow_ref would still pass with 200. Please assert the env-backed failure mode more precisely so this actually pins default-key derivation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e5e179d9-f6c2-456b-8f07-642f6ec9b0fa

📥 Commits

Reviewing files that changed from the base of the PR and between 341252b and 6b0879c.

📒 Files selected for processing (28)

.gitignore
api/oss/src/apis/fastapi/applications/router.py
api/oss/src/apis/fastapi/evaluators/router.py
api/oss/src/apis/fastapi/testsets/router.py
api/oss/src/apis/fastapi/workflows/models.py
api/oss/src/apis/fastapi/workflows/router.py
api/oss/src/core/applications/service.py
api/oss/src/core/embeds/utils.py
api/oss/src/core/environments/service.py
api/oss/src/core/evaluators/service.py
api/oss/src/core/git/types.py
api/oss/src/core/queries/service.py
api/oss/src/core/workflows/service.py
api/oss/src/dbs/postgres/git/dao.py
api/oss/tests/pytest/acceptance/test_env_backed_retrieve_policy.py
api/oss/tests/pytest/acceptance/test_revision_retrieve_env_path.py
api/oss/tests/pytest/acceptance/test_revision_retrieve_positive.py
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds.py
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_cross_entity.py
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_errors.py
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_legacy.py
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_retrieve_resolve.py
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_security.py
api/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_string.py
docs/design/playground-open-from-trace/followups.md
sdks/python/agenta/sdk/managers/applications.py
sdks/python/agenta/sdk/managers/evaluators.py
sdks/python/oss/tests/pytest/acceptance/workflows/test_embeds_integration.py

✅ Files skipped from review due to trivial changes (2)

docs/design/playground-open-from-trace/followups.md
api/oss/src/apis/fastapi/workflows/models.py

coderabbitai · 2026-05-26T18:48:13Z

+        # Normalization #2:
+        # `{slug, version}` may be `{artifact_slug, version}` (default
+        # variant + Nth revision) rather than a revision slug. Try that
+        # interpretation when the slug-based revision lookup misses.
+        if ref.slug and ref.version:
+            artifact_ref = Reference(slug=ref.slug)
+            version_only_ref = Reference(version=ref.version)
+            entity = await fetch_revision_by_refs(artifact_ref, None, version_only_ref)
+            if entity:


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't replace caller-supplied artifact scope during normalization.

This fallback synthesizes artifact_ref=Reference(slug=ref.slug) even when the embed already provided an explicit artifact ref. If the exact revision lookup misses, that can resolve a different artifact instead of surfacing the caller's inconsistent refs. Only apply this reinterpretation when no artifact scope was provided by the caller, or preserve the original artifact ref and let the service reject the mismatch.

Suggested fix

- if ref.slug and ref.version: - artifact_ref = Reference(slug=ref.slug) - version_only_ref = Reference(version=ref.version) - entity = await fetch_revision_by_refs(artifact_ref, None, version_only_ref) + if ref.slug and ref.version and artifact_ref is None and variant_ref is None: + normalized_artifact_ref = Reference(slug=ref.slug) + version_only_ref = Reference(version=ref.version) + entity = await fetch_revision_by_refs( + normalized_artifact_ref, + None, + version_only_ref, + ) if entity: return entity

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Normalization #2:

# `{slug, version}` may be `{artifact_slug, version}` (default

# variant + Nth revision) rather than a revision slug. Try that

# interpretation when the slug-based revision lookup misses.

if ref.slug and ref.version:

artifact_ref = Reference(slug=ref.slug)

version_only_ref = Reference(version=ref.version)

entity = await fetch_revision_by_refs(artifact_ref, None, version_only_ref)

if entity:

# Normalization `#2`:

# `{slug, version}` may be `{artifact_slug, version}` (default

# variant + Nth revision) rather than a revision slug. Try that

# interpretation when the slug-based revision lookup misses.

if ref.slug and ref.version and artifact_ref is None and variant_ref is None:

normalized_artifact_ref = Reference(slug=ref.slug)

version_only_ref = Reference(version=ref.version)

entity = await fetch_revision_by_refs(

normalized_artifact_ref,

None,

version_only_ref,

)

if entity:

Extracts buildRefForResolver, hasAppReference, resolveTraceRefs, and related types from playgroundController into a dedicated module so the predicate the UI gate uses (in OSS TraceTypeHeader) and the one the resolver uses (inside resolveTraceRefs) live in one file — they structurally cannot drift apart now. Adds vitest infrastructure to @agenta/playground mirroring @agenta/entities (config, scripts, devDep) and unit tests covering: - buildRefForResolver: id > slug > version priority, empty-string rejection, never-mix invariant - hasAppReference: UI gate predicate across ag.references dict format and top-level references array; empty-id-or-slug rejection (matches resolver gate) - resolveTraceRefs: projectId guard, no-identifier guard, request shaping, mismatch verification, cache TTL behaviour, no-cache-on- failure invariant - retrieveWorkflowRevision: request URL/body/params shape, early- return paths, response validation failure handling 35 new tests in @agenta/playground (new vitest setup) and 10 in @agenta/entities (existing vitest setup) — all green.

…e ref resolver Two changes that the original PR description tagged as follow-ups, lifted into this PR now that the resolver is isolated in its own module with tests covering the contract: 1) retrieveWorkflowRevision swaps the hand-rolled axios.post for the Fern-generated `client.workflows.retrieveWorkflowRevision`. The request/response shape is identical (WorkflowRevisionRetrieveRequest maps 1:1) and zod validation stays at the boundary because Fern's compile-time types under-declare backend `extra="allow"` fields (artifact_slug, variant_slug). Aligns this function with the project's "all new API code uses @agentaai/api-client" rule. 2) resolveTraceRefs migrates its in-memory Map + manual TTL check to queryClient.fetchQuery with staleTime = TRACE_REF_RESOLUTION_TTL_MS and gcTime = 2× that. Failures (no-match, slug mismatch, axios error) are removed from the cache in the outer catch so subsequent calls retry rather than serving stale absence. A small TraceRefResolverFailure sentinel preserves the original "no warn on silent fallback, warn on diagnostic paths" behaviour without muddling the queryFn's contract. Tests adapt: entities mocks `@agenta/sdk` (Fern client) instead of axios; playground tests unchanged — sentinel design preserved all semantics so the cache TTL, no-cache-on-failure, and mismatch warn assertions still pass verbatim.

ardaerzin · 2026-05-26T22:22:39Z

-- N/A

inputKeyRule short-circuited before outputKeyRule and never consulted ctx.side, so output cells receiving {input, output} payloads rendered the input field. Gate input/output/generations rules on ctx.side, and widen outputKeyRule to also match bare {output: ...} (previously only LangChain's {returnValues: {output}} shape was recognised).

ardaerzin

fe changes 👍

…aults fix(frontend): seed parameter defaults for custom URL workflows

…e-resolver fix(frontend): resolve every identifying trace ref when opening playground

mmabrouk and others added 30 commits May 24, 2026 15:00

added openinference adapter that parses tools correctly

2c82122

Added tests to adapter

1ad49e4

feat(frontend): show span parameters in trace overview

1f7824c

Render ag.data.parameters as a collapsed, beautified panel in the trace overview, alongside inputs and outputs, shown only when parameters exist. Adds a defaultCollapsed prop to TraceSpanDrillInView and AccordionTreePanel.

fix(frontend): strip tree fields from trace raw data

60bc2fd

[fix] Extend guardrails to all git-based entities

ed3097c

artifact_kind > entity_type

7313440

Add extra tests

970c290

Merge pull request #4422 from Agenta-AI/fix/workflow-retrieve-guardra…

a506022

…ils-extension-1 [fix] Extend guardrails to all git-based entities

[chore] Add extensions

fc35abe

Add exception handling

a4869d7

jp-agenta and others added 9 commits May 26, 2026 20:09

fix ore sdk tests issues

b1e7bd6

Fix sdk issues surfaced by tests

8757856

fix coerabbit comment

c1746b0

Merge branch 'feat/trace-observability-batch' into fix/resolve-tests-…

6756010

…and-missing-checks-in-guardrails

revert tight fix

a08cb81

Merge branch 'fix/resolve-tests-and-missing-checks-in-guardrails' of …

73f2afd

…github.com:Agenta-AI/agenta into fix/resolve-tests-and-missing-checks-in-guardrails

fix scoping_ref

8a78650

Merge pull request #4448 from Agenta-AI/fix/resolve-tests-and-missing…

6b0879c

…-checks-in-guardrails [fix] Resolve tests and missing checks in guardrails

vercel Bot deployed to Preview May 26, 2026 18:40 View deployment

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

ardaerzin added 3 commits May 26, 2026 21:11

Merge branch 'main' into fix/playground-custom-url-defaults

e036a76

Merge branch 'main' into feat/trace-observability-batch

d50c482

vercel Bot deployed to Preview May 26, 2026 22:38 View deployment

vercel Bot deployed to Preview May 26, 2026 22:47 View deployment

ardaerzin self-requested a review May 27, 2026 06:49

ardaerzin approved these changes May 27, 2026

View reviewed changes

Merge pull request #4440 from Agenta-AI/fix/playground-custom-url-def…

584ff4b

…aults fix(frontend): seed parameter defaults for custom URL workflows

vercel Bot deployed to Preview May 27, 2026 08:05 View deployment

Merge pull request #4425 from Agenta-AI/fix/playground-open-from-trac…

9f94286

…e-resolver fix(frontend): resolve every identifying trace ref when opening playground

vercel Bot deployed to Preview May 27, 2026 08:10 View deployment

Merge branch 'main' into feat/trace-observability-batch

7fddc72

vercel Bot deployed to Preview May 27, 2026 08:14 View deployment

hotfix > remove extra consistency check on corrupted data

f4e3da1

vercel Bot deployed to Preview May 27, 2026 15:56 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trace Observability Batch PR #4443

Trace Observability Batch PR #4443
mmabrouk wants to merge 74 commits into
mainfrom
feat/trace-observability-batch

mmabrouk commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

ardaerzin commented May 26, 2026 •

edited

Loading

Uh oh!

ardaerzin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mmabrouk commented May 26, 2026

Uh oh!

github-actions Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

ardaerzin commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ardaerzin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented May 26, 2026 •

edited

Loading

ardaerzin commented May 26, 2026 •

edited

Loading