Trace Observability Batch PR #4443
Conversation
Render ag.data.parameters as a collapsed, beautified panel in the trace overview, alongside inputs and outputs, shown only when parameters exist. Adds a defaultCollapsed prop to TraceSpanDrillInView and AccordionTreePanel.
Collapse the trace tree to the spans that carry signal (LLM, tool, agent, retrieval, output parsers, and errors) and hide structural wrappers such as RunnableSequence and RunnableLambda. Key spans is the default view, with an "All spans" option in the tree settings popover and a notice showing how many spans were hidden. The filter rules live in spanVisibility.ts as editable span-type and name-pattern definitions, so the definition of a key span can change without touching the traversal logic. Also fixes a pre-existing layout bug where the trace drawer splitter dividers stopped short of the bottom on tall screens.
The playground collapsed each user turn to a single assistant message and picked the last one. A turn that calls a tool, gets a result, then replies has two assistants. The tool-calling assistant was dropped along with its tool_calls, while the tool message kept its tool_call_id. The replayed request then sent a tool message with no preceding assistant carrying tool_calls, which OpenAI rejects. The loader now keeps every assistant and tool message in the turn, in source order, and carries tool_calls, tool_call_id, and name through unchanged. The request history is built from this flat ordered list, so replay now produces a valid assistant-then-tool pairing. The old hack that stuffed tool-call args into assistant content is removed since tool_calls are now preserved structurally. UI rendering still shows one assistant per turn (the derived-turn model is single-assistant); that is a separate display follow-up and does not affect replay.
…ervability table
Before this change, the Inputs and Outputs cells in the observability
table rendered chat conversations as chat and everything else as raw
JSON. A LangChain LLMResult, an agent's returnValues.output, or any
custom tool call all collapsed into one unreadable blob. You had to
open the drawer to see anything useful.
After this change, the cells dispatch three ways: chat, beautified
extraction, or raw JSON. Beautified extraction is a new layer in the
middle. A small registry of shape-based rules pulls a useful subset
out of recognized payloads and renders just that subset.
Three rules ship:
- input-key: {input: ...} -> {input: ...}
- output-key: {returnValues: {output: ...}} -> {output: ...}
- generations-output: {generations: [[{text}, ...], ...]} ->
{output: text} for one text, {outputs: [...]} for many
The chat detector keeps its prefer hint for payloads that carry both
input and output keys in one bag (see edge case 2 in the original
chat-extraction-algorithm doc). Without the hint, the output cell can
surface input chat.
All three rules and the dispatcher live in @agenta/ui/cell-renderers
next to extractChatMessages. SmartCellContent and LastInputMessageCell
delegate the renderer decision to a new extractPreview dispatcher that
returns a discriminated union of {renderer, data, source}. The
annotation session list view inherits the new behavior automatically
because it already uses SmartCellContent.
Design notes in docs/designs/observability-cell-preview/rfc.md.
Deferred for follow-ups: user-configurable rules, span-aware
heuristics (rules that read span_type or attributes), and a parallel
test file for the new rules.
- Re-wire SmartCellContent.beautifyJson for the JSON fallback path so callers that explicitly opt into beautified rendering keep that behavior (e.g. annotation scenario list). - Add empty-array guard in LastInputMessageCell. The chat rule does not return empty arrays today, but the Preview type allows it; the guard defends against future rule changes. - Refactor extractPreview rules to return Preview | null directly. The dispatcher becomes a simple walk with no per-kind branching. - Update RFC to match the shipped implementation: status is Implemented, the three actual rules (input-key, output-key, generations-output) replace the speculative ones, and the side-hint decision is documented.
…a on ambiguous requests
Symptom: opening an n8n trace span in the playground navigated to the
Exact Match evaluator's playground instead of the n8n app. The request
sent was `{workflow_ref: {slug: "n8n"}, workflow_revision_ref: {version: "1"}}`,
and the endpoint returned an unrelated revision with `slug: "exact-match"`
and `version: "0"`. No error, no warning.
Root cause: two layers conspire.
Service: `fetch_workflow_revision` only resolves `workflow_ref` to a
variant when no `workflow_revision_ref` is supplied. With our request
shape the resolution branch was skipped and the DAO was called with
`variant_ref=None`.
DAO: `fetch_revision` has no `artifact_ref` parameter. When given a
`revision_ref` with only `version` and no `variant_ref`, every filter
branch is skipped. The query reduces to `WHERE project_id = ... LIMIT 1`
and returns whichever row Postgres happens to surface first.
Result: a `version`-only request would silently return arbitrary data.
A `(workflow_ref, version)` request would do the same because the
workflow_ref never reached the DAO.
This PR adds guardrails. It does not change what the endpoint can
answer — ambiguous shapes that used to silently return wrong data now
return HTTP 400 with a clear message.
Changes:
- DAO `fetch_revision`: track whether any identifying filter was applied
and return `None` if not. Defense in depth against the unfiltered
`LIMIT 1` trap.
- Service `fetch_workflow_revision`: reject requests where
`workflow_revision_ref` carries only `version` (no `id`, no `slug`)
and no `workflow_variant_ref` is provided. Versions are per-variant
sequence numbers and cannot identify a revision on their own.
`revision_ref.id` and `revision_ref.slug` are both project-unique and
remain valid alone.
- New domain exception `WorkflowRefInvalid` in `core/workflows/dtos.py`,
raised by the service. The three router call sites catch it and
convert to HTTPException 400 with the exception message. Matches the
existing pattern used for `VariantForkError`.
- Updated the `WorkflowRevisionRetrieveRequest` field docstrings on
`workflow_revision_ref` (and the others) to match actual behavior.
The previous text said "by `id`, or by `slug` + `version`" which
misrepresented both: `slug` alone is sufficient, and `version` alone
is not.
Existing callers audited:
Internal: `core/embeds/utils.py`, `core/applications/service.py`,
`core/evaluators/service.py`, `core/evaluations/tasks/legacy.py`,
`core/workflows/service.py` (internal), and the three router call
sites all send either: (a) a safe shape (id-based, or
`(workflow_ref + variant_ref + revision_ref.version)`), or (b) a
pass-through of caller-supplied refs from public endpoints. The
public endpoints' documented usage in
`docs/docs/prompt-engineering/integrating-prompts/02-fetch-prompt-programatically.mdx`
shows the full three-ref shape, so existing user code that follows
the docs is unaffected. User code that was sending the ambiguous
shape was already getting wrong data — it now gets a 400 it can
notice.
Frontend: only one caller of `/workflows/revisions/retrieve` exists
in `web/`, in `playgroundController.ts`, and a separate PR will
normalize that caller to never send the ambiguous shape.
Audit document: `docs/design/playground-open-from-trace/retrieve-endpoint-audit.md`
walks the full data model, the input matrix, all six bugs/silent
behaviors, what is technically possible at the data layer, and the
multi-PR plan this is the first PR of.
Not in this PR (separate tickets):
- Multi-variant disambiguation when `workflow_ref` is provided alone
and the artifact has more than one variant (deterministic ordering).
- Silent drop of `workflow_ref` when `revision_ref.id` is provided
(validate they match, or honor the ref as an additional filter).
- DAO refactor to accept `artifact_ref` directly so the service no
longer translates workflow refs into variant refs by hand.
- Frontend PR 2: normalize the playground's resolver to never send
the now-rejected shape.
Symptom: a trace emitted with `ag.refs.application_variant.slug = "n8n.default"`
(the canonical `<app_slug>.<variant_name>` form Agenta itself produces) was
stored with the slug silently nulled. The variant reference collapsed to `{}`,
the trace was not linked to the variant, and "open in playground" did not work.
Cause: `initialize_ag_attributes` in the OTLP ingest cleaner validated reference
slugs against `^[a-zA-Z0-9_-]+$`, stricter than the SDK's `Slug` validator
(`^[a-zA-Z0-9_.\-]+$`) and stricter than the DB column (plain `character varying`,
no constraint). Anything containing a dot failed the regex and was silently set
to None. The entry was kept but its slug was wiped.
Fix:
- Lift the SDK's regex into a named constant `URL_SAFE_SLUG` in
`sdks/python/agenta/sdk/models/shared.py` so SDK and API share one source of
truth and cannot drift again. Tightened it to
`^[a-zA-Z0-9_\-][a-zA-Z0-9_.\-]*$` so dotted slugs like `n8n.default` are
allowed, but path-traversal-shaped slugs (exactly `.`, exactly `..`, or any
leading-dot form like `.hidden`) are rejected.
- Remove the local `URL_SAFE` constant in the tracing cleaner and import the
shared `URL_SAFE_SLUG` from the SDK.
- Add two regression tests in
`api/oss/tests/pytest/unit/tracing/utils/test_attributes.py`:
- dotted slugs (`n8n.default`) are preserved on the cleaned reference;
- unsafe slugs (`a/b`, `has space`, `with?query`) and leading-dot slugs
(`.`, `..`, `.hidden`) are still nulled.
The existing "bad slug" case (`"bad slug"` with a space) continues to assert
the entry collapses to `{}`.
…ils-extension-1 [fix] Extend guardrails to all git-based entities
Symptom: invoking a chat or completion app from the playground produced
traces with ag.meta.configuration = {} on the root span, even though the
invoke payload carried a full parameters object (prompt, llm_config, etc).
The handler still received the parameters and the call ran correctly. Only
the trace was empty.
Cause: the tracing decorator at decorators/tracing.py:393 reads
TracingContext.parameters when setting the root span's configuration
attribute. That field exists on TracingContext but no code in the SDK
ever assigned to it. The grep was empty across the repo. So the root
span always recorded the fallback {}.
Fix: have ResolverMiddleware mirror the resolved parameters onto the
tracing context after it merges request and revision values. The resolver
is the single funnel where both paths converge (parameters supplied in the
invoke payload, or parameters fetched from a revision via references) and
it runs after @ag.embed expansion. Whatever the handler is about to
receive is what the trace will record.
Adds three regression tests in TestResolverMiddlewareTracingParameters
covering each path. Also wraps the existing TestResolverMiddlewareEmbedGate
tests in tracing_context_manager to match how workflow.invoke runs the
middleware in production. Without the wrapper the mutation lands on a
throwaway default context and silently passes.
Add id DESC as secondary ordering in fetch_revision's latest-pick branch so the chosen revision is deterministic when two revisions on the same variant share created_at. Also fix the pre-commit ruff hook: invoke the `ruff` binary directly instead of `python3 -m ruff`, which fails on systems where the system Python doesn't ship the ruff module (e.g. Homebrew Python 3.14 on macOS). Refs: docs/design/playground-open-from-trace/followups.md C6 / D7
Add the missing slug branch in GitDAO.fetch_variant: when artifact_ref carries only slug (no id), join the artifact table and filter by it. Mirrors how fetch_revision handles variant_ref.slug. Also add the applied_identifying_filter bailout (parallel to fetch_revision's existing one) so an unidentified ref falls cleanly through to None instead of running an unfiltered LIMIT 1. Refs: docs/design/playground-open-from-trace/followups.md C1.1 / D2.1
When fetch_variant is called with artifact_ref (no variant_ref), the DAO previously did LIMIT 1 with no ORDER BY, so the picked variant was whatever Postgres surfaced first — non-deterministic for multi-variant artifacts. Add ORDER BY created_at ASC, id ASC so the pick is stable. First-created variant wins; ties broken by id. Interim of C1b (is_default flag) which will replace this with an explicit-flag lookup. Refs: docs/design/playground-open-from-trace/followups.md C1a / D2
Acceptance tests pinning rules 2.a (unique/minimal/sufficient), 2.b (redundant/consistent), and 2.d (latest-revision and default-variant fallbacks) across applications, evaluators, queries, testsets, environments, and workflows. Each entity gets six tests covering: - revision id-based lookup - revision slug-based lookup - variant + version lookup - variant-only lookup (latest revision fallback) - artifact-only lookup (default-variant + latest fallback) - redundant-consistent triple lookup Catches regressions in C1a's deterministic default-variant pick and C1.1's slug-filter fix, plus any future shift in resolution behavior. Refs: docs/design/playground-open-from-trace/followups.md C8 (first pass)
…ariant
Relax validate_revision_ref_unambiguous so {revision_ref:{version}} is
accepted when either variant_ref OR artifact_ref is identifying. Either
is sufficient scope: variant directly, artifact via the deterministic
default-variant fallback (post-C1a).
Add needs_default_variant_resolution helper so each entity service can
detect the {artifact_ref + version} shape (and the existing
{artifact_ref} alone shape) with one expression. Apply across workflows,
testsets, queries, and environments services to broaden their existing
resolution branches. Applications and evaluators inherit via delegation
to workflows_service.
Rename _has_id_or_slug to _is_identifying — captures the semantic concept
instead of the implementation.
Tests updated:
- test_revision_retrieve_ambiguous_400.py: drop the 6 stale negative
tests asserting {artifact + version} returns 400 (it no longer does).
- test_revision_retrieve_positive.py: add 6 positive tests asserting
{artifact + version} returns the right revision for each entity.
- test_revision_ref_validation.py: drop the "still raises" unit test,
add 2 new "artifact scopes version" passing cases.
Followups.md updated for the _has_id_or_slug rename.
Refs: docs/design/playground-open-from-trace/followups.md C2 / D3
Rename RevisionRefInvalid -> RetrieveRefsInsufficient; add new sibling
RetrieveRefsInconsistent for redundant refs that disagree with the
resolved revision.
The new validate_retrieve_refs_consistent helper compares every
identifying field (id, slug, version) on caller-supplied
artifact/variant/revision refs against the resolved Revision's
corresponding fields (artifact_id, artifact_slug, variant_id,
variant_slug, id, slug, version). On mismatch raise
RetrieveRefsInconsistent; the routers translate it to HTTP 400.
Wired into fetch_*_revision in workflows, testsets, queries, and
environments. Applications and evaluators inherit via delegation to
workflows_service. Each service snapshots the caller-supplied refs
before default-variant resolution mutates them locally so the
consistency check sees what the caller actually sent.
Tests:
- 22 new unit tests for the consistency helper
- 6 acceptance tests (one per entity) asserting inconsistent
{artifact_ref.slug + revision_ref.id} returns 400
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…round
Opening a trace span in the playground only resolved when app.slug was
set without app.id, sent a hand-picked subset of refs, and overwrote
existing ids with the resolver's result. Spans that carried only
variant.slug, only rev.id, only rev.slug, or app.id without a revision
either silently opened the wrong app, dropped to a useless ephemeral
playground, or navigated to /apps/{id}/playground with no ?revisions=
(the page does not auto-pick a default revision).
This change broadens the resolver to fire whenever the trace gives us
any identifying ref the backend can scope a lookup from (any of
app/variant/revision x id/slug, or version when variant is also present).
The resolver sends every identifier the trace carries and only backfills
the ids that are missing - refs the trace supplied are authoritative and
never overwritten. The appDefault branch is removed; under the new rule
a successful resolve always returns both app.id and revision.id
together, so the URL we navigate to always carries ?revisions=.
Specifics:
- New resolveTraceRefs(refs, projectId) in playgroundController.ts.
Picks the smallest identifying ref per level (id ?? slug ?? version),
caches successful results with a 5-minute TTL, falls back to ephemeral
on null/error.
- openFromTraceAtom skips the round-trip when app.id and revision.id
are both already on the refs.
- retrieveWorkflowRevision accepts an optional workflowRef so callers
can resolve from variant-only or revision-only refs.
- hasAppReference in TraceTypeHeader now requires id or slug on the ref
(matching the resolver's gate) so the button enable predicate and the
resolver predicate agree on bare-version refs.
- Drops OpenFromTraceResult.appDefault from the union and the navigation
branch that depended on it.
Verified against the live /workflows/revisions/retrieve endpoint with a
probe that exercises 19 input combinations (matrix in
docs/design/playground-open-from-trace/plan.md). Every combination with
at least one id or slug resolves to the expected revision; bare-version
requests are correctly rejected by the backend (PR #4418 guard) and
fall through to ephemeral.
`validate_retrieve_refs_consistent` previously took seven flattened kwargs (resolved_artifact_id, _slug, resolved_variant_id, _slug, resolved_revision_id, _slug, _version). Collapse them into three grouped `Reference` kwargs (`resolved_artifact_ref`, `resolved_variant_ref`, `resolved_revision_ref`) so the resolved side mirrors the caller side of the signature. All four service call sites and the unit tests are updated accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Variants have no `version` field — version is a per-variant counter on revisions. A `variant_ref` populated with only `version` is nonsense the DAO would silently drop; reject at the boundary. - Rename validate_revision_ref_unambiguous -> validate_revision_refs_sufficient so the validator name lines up with the RetrieveRefsInsufficient exception it raises. - Add sibling validate_variant_refs_sufficient with matching contract. - Wire variant-side validation into fetch_*_revision AND fetch_*_variant across workflows/applications/evaluators/testsets/queries/environments. - Rename test_revision_ref_validation -> test_retrieve_refs_sufficiency to match new naming; extend it with a TestVariantVersionOnlyRejection class and a TestVariantSufficientShapesPass class. - Add an acceptance test per entity asserting the 400. Env-revision-version is already covered by validate_revision_refs_sufficient because the env service treats environment_ref as artifact_ref — no separate env mirror needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three routers (applications, workflows, evaluators) host the same env-path
retrieve policy with subtly different rules. Bring them in sync:
- Drop the workflows/evaluators path-mixing rejection. Mixing entity refs
and env refs is now allowed at the policy boundary; C3's consistency
check rejects contradictions downstream (D10).
- Adopt applications' key-derivation pattern in workflows and evaluators:
when env refs are supplied without `key`, derive
`key = f"{artifact_slug}.revision"` from artifact_ref.slug, or by
fetching the artifact when only artifact_ref.id is present (D11).
Kept inline per router. The actual helper extraction (this layer AND the
six-entity service-layer pipeline AND the C10 exception-translation
boilerplate) is bundled as a deferred C5b item in followups.md — they all
touch the same per-entity duplicates and want one combined refactor.
Tests:
- Acceptance: pin the aligned behavior for all three routers
(empty request -> 400, env-ref-without-key -> 400, path-mixing
passes policy boundary).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…h-mixing
Add acceptance tests for the three env-supporting routers (workflows,
applications, evaluators) that pin behavior unchanged by C5 but not
covered by C8's first pass:
* 2.a env-path: {env_ref + key} resolves the deployed revision.
* 2.d env-path: {env_ref + artifact_ref} (no key) derives the
default key {artifact_slug}.revision.
* 2.b path-mixed: {env_ref + key + artifact_ref (matching)} is
accepted (redundant-consistent).
* 2.c path-mixed: {env_ref + key + revision_ref pointing elsewhere}
is rejected with 400 by the post-resolution consistency check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace 12 inline `try/except (RetrieveRefsInsufficient,
RetrieveRefsInconsistent)` and `except VariantForkError` blocks across
six routers (workflows, applications, evaluators, testsets, queries,
environments) with a single `@handle_git_exceptions()` decorator at
each route.
Introduces `apis/fastapi/git/exceptions.py`:
* `VariantForkErrorException`, `RetrieveRefsInsufficientException`,
`RetrieveRefsInconsistentException` — typed `HTTPException(400)`
subclasses mirroring the folders/ pattern.
* `handle_git_exceptions()` — translates each core exception to its
typed counterpart. Inserted innermost in the decorator stack so it
runs before `@suppress_exceptions(exclude=[HTTPException])` and the
typed exception propagates.
Any new git-domain exception added to `core/git/types.py` now needs a
single registration in this module rather than N call-site edits.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bring user-facing prose in line with the post-C2/C3/C4/C5 behavior:
* docs/docs/reference/api-guide/04-versioning.mdx — replace
"the most specific wins" with the new contract that every
supplied reference must agree, and enumerate the 400 cases
(revision-version-only, variant-version-only, contradictions).
* docs/docs/prompt-engineering/integrating-prompts/02-fetch-prompt-
programatically.mdx — add a "How the request resolves" section
listing valid identifying shapes and the 400 cases.
* api/oss/src/apis/fastapi/{applications,workflows,evaluators}
/models.py — replace "arbitrary variant" wording in
*_ref descriptions (post-C1a the default variant is deterministic)
and add a top-level docstring noting the consistency contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Railway Preview Environment
|
… naming
- `hasAppReference` now requires `id`/`slug` to be a non-empty string,
matching `asString` semantics in the playground resolver. Spans
carrying `{id: "", slug: ""}` previously enabled the Playground
button only to fall through to ephemeral after the resolver gated
them out.
- Rename `onlyRevVersion` → `hasNoIdentifyingRef` in both the
resolver and `retrieveWorkflowRevision` — the variable was true
whenever no identifying ref existed at any level, not specifically
when only a revision `version` was supplied. Comments updated to
match.
…and-missing-checks-in-guardrails
…github.com:Agenta-AI/agenta into fix/resolve-tests-and-missing-checks-in-guardrails
…-checks-in-guardrails [fix] Resolve tests and missing checks in guardrails
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
api/oss/src/core/environments/service.py (1)
619-636:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTreat empty
Reference({})values as no input before the DAO call.
Referenceobjects are truthy even whenid,slug, andversionare allNone. With the current guard,{}skips the early return, passes these validators, and can still reachfetch_revision()with no identifying scope. That turns an empty retrieve into an under-scoped DAO read instead ofNoneor a 400. Normalize empty refs toNonefirst, or base this guard onis_identifying(...)plus explicitversionpresence for revision refs.api/oss/src/core/queries/service.py (1)
656-668:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEmpty
Reference({})requests still fall through tofetch_revision().The early return uses
bool(ref), butReference(id=None, slug=None, version=None)is still truthy. A body like{"query_ref": {}}will bypass this guard, survive both validations, and hit the DAO with no real identifying scope. Normalize empty refs toNone, or switch this guard tois_identifying(...)with an explicitrevision_ref.versioncheck.
🧹 Nitpick comments (1)
api/oss/tests/pytest/acceptance/test_revision_retrieve_positive.py (1)
1215-1231: ⚡ Quick winThis assertion still passes if the router ignores
environment_ref.Because the test accepts any non-400/non-5xx response, a regression that bypasses env-backed lookup and resolves only
workflow_refwould still pass with 200. Please assert the env-backed failure mode more precisely so this actually pins default-key derivation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: e5e179d9-f6c2-456b-8f07-642f6ec9b0fa
📒 Files selected for processing (28)
.gitignoreapi/oss/src/apis/fastapi/applications/router.pyapi/oss/src/apis/fastapi/evaluators/router.pyapi/oss/src/apis/fastapi/testsets/router.pyapi/oss/src/apis/fastapi/workflows/models.pyapi/oss/src/apis/fastapi/workflows/router.pyapi/oss/src/core/applications/service.pyapi/oss/src/core/embeds/utils.pyapi/oss/src/core/environments/service.pyapi/oss/src/core/evaluators/service.pyapi/oss/src/core/git/types.pyapi/oss/src/core/queries/service.pyapi/oss/src/core/workflows/service.pyapi/oss/src/dbs/postgres/git/dao.pyapi/oss/tests/pytest/acceptance/test_env_backed_retrieve_policy.pyapi/oss/tests/pytest/acceptance/test_revision_retrieve_env_path.pyapi/oss/tests/pytest/acceptance/test_revision_retrieve_positive.pyapi/oss/tests/pytest/acceptance/workflows/test_workflow_embeds.pyapi/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_cross_entity.pyapi/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_errors.pyapi/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_legacy.pyapi/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_retrieve_resolve.pyapi/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_security.pyapi/oss/tests/pytest/acceptance/workflows/test_workflow_embeds_string.pydocs/design/playground-open-from-trace/followups.mdsdks/python/agenta/sdk/managers/applications.pysdks/python/agenta/sdk/managers/evaluators.pysdks/python/oss/tests/pytest/acceptance/workflows/test_embeds_integration.py
✅ Files skipped from review due to trivial changes (2)
- docs/design/playground-open-from-trace/followups.md
- api/oss/src/apis/fastapi/workflows/models.py
| # Normalization #2: | ||
| # `{slug, version}` may be `{artifact_slug, version}` (default | ||
| # variant + Nth revision) rather than a revision slug. Try that | ||
| # interpretation when the slug-based revision lookup misses. | ||
| if ref.slug and ref.version: | ||
| artifact_ref = Reference(slug=ref.slug) | ||
| version_only_ref = Reference(version=ref.version) | ||
| entity = await fetch_revision_by_refs(artifact_ref, None, version_only_ref) | ||
| if entity: |
There was a problem hiding this comment.
Don't replace caller-supplied artifact scope during normalization.
This fallback synthesizes artifact_ref=Reference(slug=ref.slug) even when the embed already provided an explicit artifact ref. If the exact revision lookup misses, that can resolve a different artifact instead of surfacing the caller's inconsistent refs. Only apply this reinterpretation when no artifact scope was provided by the caller, or preserve the original artifact ref and let the service reject the mismatch.
Suggested fix
- if ref.slug and ref.version:
- artifact_ref = Reference(slug=ref.slug)
- version_only_ref = Reference(version=ref.version)
- entity = await fetch_revision_by_refs(artifact_ref, None, version_only_ref)
+ if ref.slug and ref.version and artifact_ref is None and variant_ref is None:
+ normalized_artifact_ref = Reference(slug=ref.slug)
+ version_only_ref = Reference(version=ref.version)
+ entity = await fetch_revision_by_refs(
+ normalized_artifact_ref,
+ None,
+ version_only_ref,
+ )
if entity:
return entity📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Normalization #2: | |
| # `{slug, version}` may be `{artifact_slug, version}` (default | |
| # variant + Nth revision) rather than a revision slug. Try that | |
| # interpretation when the slug-based revision lookup misses. | |
| if ref.slug and ref.version: | |
| artifact_ref = Reference(slug=ref.slug) | |
| version_only_ref = Reference(version=ref.version) | |
| entity = await fetch_revision_by_refs(artifact_ref, None, version_only_ref) | |
| if entity: | |
| # Normalization `#2`: | |
| # `{slug, version}` may be `{artifact_slug, version}` (default | |
| # variant + Nth revision) rather than a revision slug. Try that | |
| # interpretation when the slug-based revision lookup misses. | |
| if ref.slug and ref.version and artifact_ref is None and variant_ref is None: | |
| normalized_artifact_ref = Reference(slug=ref.slug) | |
| version_only_ref = Reference(version=ref.version) | |
| entity = await fetch_revision_by_refs( | |
| normalized_artifact_ref, | |
| None, | |
| version_only_ref, | |
| ) | |
| if entity: |
Extracts buildRefForResolver, hasAppReference, resolveTraceRefs, and related types from playgroundController into a dedicated module so the predicate the UI gate uses (in OSS TraceTypeHeader) and the one the resolver uses (inside resolveTraceRefs) live in one file — they structurally cannot drift apart now. Adds vitest infrastructure to @agenta/playground mirroring @agenta/entities (config, scripts, devDep) and unit tests covering: - buildRefForResolver: id > slug > version priority, empty-string rejection, never-mix invariant - hasAppReference: UI gate predicate across ag.references dict format and top-level references array; empty-id-or-slug rejection (matches resolver gate) - resolveTraceRefs: projectId guard, no-identifier guard, request shaping, mismatch verification, cache TTL behaviour, no-cache-on- failure invariant - retrieveWorkflowRevision: request URL/body/params shape, early- return paths, response validation failure handling 35 new tests in @agenta/playground (new vitest setup) and 10 in @agenta/entities (existing vitest setup) — all green.
…e ref resolver Two changes that the original PR description tagged as follow-ups, lifted into this PR now that the resolver is isolated in its own module with tests covering the contract: 1) retrieveWorkflowRevision swaps the hand-rolled axios.post for the Fern-generated `client.workflows.retrieveWorkflowRevision`. The request/response shape is identical (WorkflowRevisionRetrieveRequest maps 1:1) and zod validation stays at the boundary because Fern's compile-time types under-declare backend `extra="allow"` fields (artifact_slug, variant_slug). Aligns this function with the project's "all new API code uses @agentaai/api-client" rule. 2) resolveTraceRefs migrates its in-memory Map + manual TTL check to queryClient.fetchQuery with staleTime = TRACE_REF_RESOLUTION_TTL_MS and gcTime = 2× that. Failures (no-match, slug mismatch, axios error) are removed from the cache in the outer catch so subsequent calls retry rather than serving stale absence. A small TraceRefResolverFailure sentinel preserves the original "no warn on silent fallback, warn on diagnostic paths" behaviour without muddling the queryFn's contract. Tests adapt: entities mocks `@agenta/sdk` (Fern client) instead of axios; playground tests unchanged — sentinel design preserved all semantics so the cache TTL, no-cache-on-failure, and mismatch warn assertions still pass verbatim.
|
-- N/A |
inputKeyRule short-circuited before outputKeyRule and never consulted
ctx.side, so output cells receiving {input, output} payloads rendered
the input field. Gate input/output/generations rules on ctx.side, and
widen outputKeyRule to also match bare {output: ...} (previously only
LangChain's {returnValues: {output}} shape was recognised).
…aults fix(frontend): seed parameter defaults for custom URL workflows
…e-resolver fix(frontend): resolve every identifying trace ref when opening playground
Context in merged (approved) PRs.