Hermes support by pallakatos · Pull Request #396 · Azure/kars

pallakatos · 2026-06-04T05:26:38Z

Summary

First end-to-end Hermes docker smoke test surfaced two real bugs from the A1 ship:

hermes-agent==0.5.1 doesn't exist on PyPI — the 0.5.x assumption was from misreading the Hermes README's Homebrew formula tag (5.1.14). PyPI uses 0.x.y at 0.15.2 latest. Bumped pin to 0.15.2.
kars plugin discovered but not loaded — Hermes treats standalone plugins as opt-in via plugins.enabled in config.yaml. The entrypoint was materializing the plugin into the right path ($HERMES_HOME/plugins/kars/) but never adding kars to the allow-list — so it was discovered and silently skipped (error='not enabled in config').

Bonus: ripgrep not in Azure Linux 3 tdnf caused tdnf install -y to fail the whole layer. Hermes' file_search falls back to grep cleanly, so dropped it. Image now builds in ~30s.

Verification

✅ docker build --platform linux/amd64 -f sandbox-images/hermes/Dockerfile -t kars-sandbox-hermes:dev . succeeds
✅ hermes_cli.plugins.discover_plugins() loads kars plugin from $HERMES_HOME/plugins/kars/ with 10 tools + 2 hooks (pre_tool_call + post_tool_call)
✅ End-to-end entrypoint dry-run produces correct config.yaml with both plugins.enabled: [kars] and mcp_servers.platform blocks
✅ 83/83 Python unit tests still pass inside the image
✅ All 8 ci-gates pass locally (security-audit, copyright, no-stubs, no-custom-crypto, etc.)

Plugin contract confirmation: Hermes 0.15.2's PluginContext.register_tool / register_hook signatures and plugin.yaml schema (provides_tools / provides_hooks) match exactly what the A1 plugin code was already built for.

Security

See docs/internal/security-audits/2026-06-04-hermes-act1-docker-smoke-fixes.md — no new threat surface beyond what existing A1 audits already cover; this commit only flips the previously-audited surface from dormant to live.

Two real bugs surfaced when running the first `docker build` + end-to-end smoke test of the Hermes sandbox image: 1. **Hermes version pin wrong** `ARG HERMES_VERSION=0.5.1` doesn't exist on PyPI. The 0.5.x assumption came from misreading the Hermes README's Homebrew formula tag (`5.1.14`); the actual `hermes-agent` PyPI package uses 0.x.y numbering at 0.15.2 latest. Bumped to 0.15.2. Hermes 0.15.2's plugin contract (PluginContext.register_tool, register_hook, plugin.yaml with provides_tools/provides_hooks, discovery via `$HERMES_HOME/plugins/`) matches what the A1 plugin code was already built for — verified by importing hermes_cli.plugins and running discover_plugins() against our materialized plugin tree. 2. **ripgrep not in Azure Linux 3** `tdnf install -y` exits non-zero if ANY package is missing, and Azure Linux 3 doesn't ship ripgrep. Hermes' built-in file_search tool prefers ripgrep but falls back to grep, so dropping it is safe. Image now builds in ~30s. 3. **kars plugin discovered but not loaded** Hermes treats `standalone` plugins as opt-in via `plugins.enabled` in config.yaml. The entrypoint was placing the kars plugin into `$HERMES_HOME/plugins/kars/` (correct user discovery path), but never adding `kars` to the enabled allow-list — so it was discovered and silently skipped with `error='not enabled in config'`. The entrypoint now emits a `plugins.enabled: [kars]` block at the top of every generated config.yaml. The awk-merge that replaces prior `mcp_servers:` blocks was extended to also replace prior `plugins:` blocks so re-runs are idempotent. Verified end-to-end: - `docker build` succeeds - `discover_plugins()` loads kars plugin, registers 10 tools + 2 hooks (pre_tool_call + post_tool_call) - Entrypoint generates correct config.yaml with both blocks - `$HERMES_HOME/plugins/kars/` materialized from `/opt/kars-hermes-stage/plugins/kars/` on every boot - 83/83 python unit tests still pass inside the image - Mock smoke run: `python3 -m hermes_cli.plugins discover` shows kars: enabled=True, 17 total plugin tools across all enabled plugins (10 from kars + 7 web/foundry from bundled providers) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-04T05:27:45Z

Dependency Review

The following issues were found:

✅ 0 vulnerable package(s)
✅ 0 package(s) with incompatible licenses
✅ 0 package(s) with invalid SPDX license definitions
⚠️ 5 package(s) with unknown licenses.

See the Details below.

License Issues

mesh-plugin/package.json

Package	Version	License	Issue Type
@microsoft/agent-governance-sdk	file:../vendor/agt/microsoft-agent-governance-sdk-4.0.0-agt-3322175d.tgz	Null	Unknown License

runtimes/agt-mesh-python/pyproject.toml

Package	Version	License	Issue Type
agentmesh-platform	>= 3.6.0,< 5.0.0	Null	Unknown License
httpx	>= 0.27,< 1.0	Null	Unknown License
pynacl	>= 1.5,< 2.0	Null	Unknown License
websockets	>= 12,< 14	Null	Unknown License

OpenSSF Scorecard

Package	Version	Score	Details
npm/@microsoft/agent-governance-sdk	file:../vendor/agt/microsoft-agent-governance-sdk-4.0.0-agt-3322175d.tgz	Unknown	Unknown
pip/agentmesh-platform	>= 3.6.0,< 5.0.0	Unknown	Unknown
pip/httpx	>= 0.27,< 1.0	Unknown	Unknown
pip/pynacl	>= 1.5,< 2.0	Unknown	Unknown
pip/websockets	>= 12,< 14	Unknown	Unknown

Scanned Files

mesh-plugin/package.json
runtimes/agt-mesh-python/pyproject.toml

Two follow-ups from the kind-cluster end-to-end smoke test: 1. **Helm CRD schema missing Hermes enum** — controller's `crd.rs` added `RuntimeKind::Hermes` in a7882b8 but the matching Helm CRD YAML wasn't updated. Result: the API server rejected every KarsSandbox with `runtime.kind: Hermes` BEFORE the controller ever saw it. Verified by `kubectl apply --dry-run=server` failing with "unknown enum value 'Hermes'". Added: - `Hermes` to the `runtime.kind` enum at line 85 - x-kubernetes-validations rule: `(self.kind == 'Hermes') == has(self.hermes)` - `runtime.hermes` properties block mirroring `pydanticAi` shape (version, agentCode oci/git, entrypoint, extraEnv) After the fix, `kubectl apply -f /tmp/hermes-sandbox.yaml` succeeds, controller picks up the CR, and a 2-container pod (`agent` + `inference-router`) reaches `2/2 Running` with the kars plugin loaded (10 tools + 2 hooks registered). 2. **`.cargo-docker/` not gitignored** — when cross-compiling for linux/arm64 via `docker run -v $PWD:/work … cargo build` (the pattern used for kind-on-M-series), `CARGO_HOME=/work/.cargo-docker` keeps container-arch crate cache out of the host's `~/.cargo`. That directory was leaking into `git status`. Added rules: - `.cargo-docker/` — explicit - `/bin/` was already covered by `**/[Bb]in/*` (verified) Verified end-to-end on kind cluster `kars-dev`: $ kubectl get karssandbox,pods -n kars-smoke-hermes NAME PHASE RUNTIME INFERENCEPOLICY ISOLATION smoke-hermes Hermes smoke-inference standard NAME READY STATUS RESTARTS smoke-hermes-697c6bd557-q5xfr 2/2 Running 0 Plugin discovery inside the pod: kars plugin: enabled=True, source=user hooks : {'pre_tool_call': 1, 'post_tool_call': 1} tools : http_fetch, kars_discover, kars_mesh_{send,inbox, await,transfer_file}, kars_spawn{,_status,_destroy, _list} Router /healthz from the agent container: 200 ok Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

End-to-end Hermes smoke on kind cluster exposed and fixed six real bugs blocking the runtime from being functional: 1. awk not in Azure Linux 3 — replaced entrypoint merge with Python 2. TUI mode crashed without TTY — switched to hermes gateway run 3. KARS_MCP_SERVERS injected only into "openclaw" container — generalized to use agent_container_name based on runtime kind 4. Entrypoint scanned wrong path for MCP servers — aligned to the KARS_MCP_SERVERS env + loopback router pattern 5. hermes config set used key=value (wrong) — fixed to two positional args 6. Router rustls CryptoProvider not pre-installed — added explicit aws_lc_rs::default_provider().install_default() in main() Verified 12/12 e2e checks pass on kind cluster: - Pod 2/2 Running, plugin loaded with 10 tools + 2 hooks - Router /healthz, /agt/evaluate, /egress/fetch, /sandbox/list all 200 - KarsMemory CR Compiled, McpServer translated, channel translation - Mesh stubs return clear Act 2 error - pre_tool_call hook fires + decision=allow All 834 controller + 932 router Rust tests pass. cargo clippy clean, cargo fmt applied. Security audit: docs/internal/security-audits/2026-06-04-hermes-act1-e2e-smoke-fixes.md Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The sandbox NetworkPolicy gated ALL ingress rules behind `governance.enabled=true`. With governance off, the NP shipped with `policyTypes: [Ingress, Egress]` and an empty `ingress: []` block — deny-all ingress. The operator namespace then could not reach `/internal/policy-status` on the router and every referencing InferencePolicy / KarsMemory / ToolPolicy / McpServer / EgressApproval stuck forever in `Ready=False / AwaitingRouterEnforcement`, observable in the operator panel even though the sandbox itself was healthy and the router /readyz returned 200. Split into two ingress classes: - **Operator policy-echo ingress** (router :8443 admin surface from ns labeled `app.kubernetes.io/name=kars,component=system`) — emitted UNCONDITIONALLY. Three orthogonal gates still protect it: bearer token, constant-time compare, optional IP pinning. - **Peer-sandbox mesh + gateway ingress** (8443 / 18789 / 18791 from ns labeled `kars.azure.com/role=sandbox`) — kept gated on governance.enabled (no peers when governance is off). Surfaced during local-k8s smoke of smoke-hermes: even after fixing the AZURE_OPENAI_API_KEY env path so /readyz returned 200, three policy CRs (InferencePolicy, KarsMemory, ToolPolicy) stayed Ready=False because the controller's /internal/policy-status probe to the sandbox router timed out at the NetworkPolicy level. After this fix, with governance off, the controller's HTTP probe gets a 401 (admin-token gate doing its job) instead of a connection timeout, and the policy reconcilers update status using the round trip rather than reporting "router unreachable". Verified end-to-end on kind cluster `kars-dev`: $ kubectl get inferencepolicy smoke-inference -n kars-system -o jsonpath='{.status.conditions}' | jq - Ready=True RouterEnforcing: all 1 referencing sandbox router(s) confirmed inference-policy digest - Progressing=False Reconciled: router echo confirmed $ kubectl get karsmemory smoke-mem -n kars-system -o jsonpath='{.status.conditions}' | jq - Ready=True RouterEnforcing: all 1 referencing sandbox router(s) confirmed claw-memory binding digest 834 controller tests still pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Collapses the canonical 4-agent exec-brief scenario (parent + analyst + viz + writer) into a single Hermes agent doing the whole pipeline itself — research, scorecard, hero image, written brief. Built to validate the Hermes runtime adapter end-to-end on local-k8s and AKS without depending on the Python AGT MeshClient (which ships in Act 2; until then, `kars_mesh_*` returns explicit "Act 2 not ready" errors and the prompt explicitly tells the agent not to call those tools). Scenario layout (mirrors exec-brief/): - manifests/00-namespace.yaml ........ kars-execbrief-hermes ns - manifests/01-inferencepolicy.yaml .. azure-openai gpt-5.4 - manifests/02-toolpolicy.yaml ....... allow-all AGT profile - manifests/03-clawmemory.yaml ....... memory-execbrief-hermes store - manifests/04-mcpserver.yaml ........ DeepWiki MCP (same as canonical) - manifests/05-clawsandbox.yaml ...... runtime.kind: Hermes - config.sh .......................... SCENARIO_SUB_SANDBOXES=() - prompt.txt ......................... single-agent pipeline - README.md .......................... what it exercises + skips Verified on kind cluster `kars-dev`: $ kubectl apply -f tools/e2e-harness/scenarios/exec-brief-hermes-single/manifests/ → 6 resources created $ kubectl get karssandbox execbrief-hermes -n kars-system PHASE=healthy RUNTIME=Hermes $ kubectl get pods -n kars-execbrief-hermes execbrief-hermes-... 2/2 Running All 5 CRs reach RouterEnforcing / Ready=True: ● execbrief-hermes-inference InferencePolicy router echo confirmed ● execbrief-hermes-toolpolicy ToolPolicy agt-profile digest confirmed ● execbrief-hermes-memory KarsMemory binding=bound ● execbrief-hermes-deepwiki McpServer healthy ● execbrief-hermes KarsSandbox healthy In-pod verification: - kars plugin: enabled=True source=user, 10 tools + 2 hooks - foundry_memory store_name = memory-execbrief-hermes (matches CR) - config.yaml mcp_servers.execbrief-hermes-deepwiki present - KARS_MCP_SERVERS=execbrief-hermes-deepwiki in agent env - Router /readyz: 200 ok Note: the actual LLM execution of the prompt requires real Azure OpenAI / Foundry credentials. With the fake-key dev overlay used in this validation, the pipeline runs through Hermes → kars plugin → router → upstream-call layer and hangs at the upstream (expected). Running with real creds — either via `kars dev --target local-k8s` with a real provider, or on AKS via `SCENARIO=exec-brief-hermes-single PLATFORM=aks ./tools/e2e-harness/run.sh` — will execute the full pipeline. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… hardening End-to-end run of the new `exec-brief-hermes-single` scenario on local-k8s surfaced four more bugs that all gate the prompt from actually reaching the model: 1. **`pull_policy=Always` for `:latest` images** in dev mode forced a doomed registry pull (karsacr.azurecr.io/…) instead of using the kind-cached image. The controller now picks `IfNotPresent` when `KARS_DEV_PROFILE=true` is set on its own env. Production AKS stays on `Always` for `:latest`. 2. **Hermes' `tirith` auto-download** from GitHub releases blocked every cold start while the kars egress-guard slow-walked the fetch. Entrypoint now sets `TIRITH_ENABLED=false` by default; Hermes falls back to its built-in pattern-matching shell checker. Operators can re-enable by pre-baking the binary at `/usr/local/bin/tirith` and setting `TIRITH_ENABLED=true`. 3. **`HERMES_DISABLE_LAZY_INSTALLS=1`** suppresses Hermes' `pip install` of discord.py / google-* / brotlicffi on first use of bundled platform plugins. Saves 30–120s on every cold start; operators wanting the extras re-bake into the image. 4. **`HERMES_SKIP_NODE_BOOTSTRAP=1`** suppresses Hermes' shell-based Node.js 22 LTS auto-installer (scripts/install.sh). We pre-install `nodejs` + `nodejs-npm` from the Azure Linux 3 base repo (currently v20.14 — Hermes' dep_ensure accepts any modern node). Browser tools that need a Chromium download still need to be pre-baked separately. All three Hermes-runtime knobs are also mirrored into `$HERMES_HOME/.env` so they survive `kubectl exec` sessions (kubectl exec spawns a fresh env that doesn't see entrypoint exports). Hermes' env_loader loads .env at import time (`hermes_cli/env_loader.py:_load_dotenv_with_fallback`). After all four fixes verified end-to-end: - smoke-hermes sandbox: phase=Running, 2/2 Ready - Router /readyz: 200 ok (controller forwards real Foundry API key from `kars-dev-creds` Secret via secretKeyRef) - Router /v1/chat/completions: 200 with real gpt-5.4 reply ("OK" in 1.1s, latency_checkpoint shows engine_ttft_ms=108) - InferencePolicy / KarsMemory / ToolPolicy / McpServer all Ready=True / RouterEnforcing - Plugin loaded with 10 tools + 2 hooks + foundry_memory native - Platform MCP block present in config.yaml when FOUNDRY_PROJECT_ENDPOINT is bound Outstanding gap (NOT in this commit): Hermes' `hermes -z` still makes an outbound HTTPS handshake (state=SYN_SENT to 104.18.3.115 :443, a Cloudflare IP — likely a check-update or telemetry endpoint the harness hasn't tracked down). The kars egress-guard's forward-proxy stalls the connection rather than denying outright, so the prompt-driven path hangs after plugin discovery completes. Workarounds: (a) `KARS_EGRESS_LEARN=true` to log unallowed hosts, then explicitly allowlist in EgressAllowlist; (b) find Hermes' env to disable check-update / telemetry — Act 1.x; (c) drive Hermes via Telegram channel instead of `hermes -z`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…l Foundry The single-agent exec-brief scenario (research → JSON → scorecard PNG → hero PNG → 2-page brief.md) now runs end-to-end on Hermes through the kars router to real Azure Foundry gpt-5.4. Verified on local-k8s with the user's ~/.kars/ creds. Four fixes were needed (each surfaced sequentially as the agent loop progressed further): 1. **`OPENAI_API_KEY` env routes Hermes to openrouter** (and openrouter.ai is blocked by the egress-guard). Switched the entrypoint's `.env` mirror to `AZURE_FOUNDRY_API_KEY` + `AZURE_FOUNDRY_BASE_URL` so resolve_provider() picks the `azure-foundry` provider (which has no built-in Cloudflare callback). 2. **`agent_init.py` hardcodes `_codex_reasoning_replay_enabled = True`** → Hermes echoes `{"type": "reasoning", "encrypted_content": "..."}` back to /v1/responses on every continuation, which Azure Foundry's strict schema validator rejects with `invalid_payload`. OpenAI's own Responses API accepts these. Hermes only learns to disable replay when the upstream returns `invalid_encrypted_content` (a different error code that Foundry doesn't emit). Router fix: `build_upstream_url()` in proxy.rs now strips `input[]` items of `type=reasoning` and the `include=["reasoning.encrypted_content"]` field from any /v1/responses request bound for Azure Foundry (NOT GitHub Models / Copilot — their schemas accept the original shape). 3. **/v1/responses handler used `forward()` (non-streaming)** but Hermes always opens these with `responses.create(stream=True)` and expects an SSE `text/event-stream` response. The buffered JSON blob made Hermes' SDK raise "Connection error" after ~15s and retry 6× before giving up with `max_retries_exhausted`. Switched the handler to `forward_stream()` so the SSE byte stream flows through unchanged. 4. **`forward_stream()` injected `stream_options.include_usage`** which the OpenAI Responses API rejects (`unknown_parameter`). Skip the injection for /v1/responses (Foundry already emits usage in the terminating SSE event); was already skipped for Anthropic /v1/messages — same exclusion now covers both shapes. Plus the entrypoint now persists `model.{default,provider,base_url}` in config.yaml on every boot (not just plugins+mcp_servers), so a fresh pod doesn't need a one-time `hermes config set model` post-boot dance. End-to-end run delivered: /sandbox/incoming/brief.md 6,136 B (2 pages, real Markdown, 12 footnoted https citations, references hero+scorecard PNGs inline, all 4 control-domain terms present) /sandbox/incoming/analyst.json 5,025 B (foundry_web_search × 3 → trends / control_categories / runtimes / metrics) /sandbox/incoming/hero.png 30,094 B (1024×1024, foundry_image_generation gpt-image-1, "Defense in Depth" isometric data-center cutaway) /sandbox/incoming/scorecard.png 12,201 B (1024×640, foundry_code_execute matplotlib grouped bar chart, 4 runtimes × 4 control columns) Router log: 30+ /v1/responses SSE streams, all 200 OK, latencies 1.6–67s. Foundry stream headers received for every request after this fix; pre-fix only 2 of 8 requests had `Foundry complete` entries before Hermes gave up. Agent stdout (final response after autonomous tool-use loop): > Done. Artifacts produced: > - /sandbox/incoming/brief.md — 6136 bytes > - /sandbox/incoming/hero.png — 30094 bytes > - /sandbox/incoming/scorecard.png — 12201 bytes > - /sandbox/incoming/analyst.json — 5025 bytes > Verified: brief.md exists and references both image files > hero.png and scorecard.png exist as real PNGs > analyst.json exists with the normalized runtime comparison All 932 router + 834 controller Rust tests still pass. Deliverables captured under: tools/e2e-harness/out/hermes-exec-brief-delivered/ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Two visibility gaps surfaced after the Hermes exec-brief run: operator panel showed `sandbox="unknown"` (instead of the real sandbox name) and zero token counters for every /v1/responses call. 1. **sandbox label was "unknown"**: every `x-kars-sandbox` header parser fell back to `"unknown"` when the header wasn't set — which is the default for clients like Hermes' openai SDK that don't add kars-specific headers. Per-sandbox routers KNOW their own identity via the `SANDBOX_NAME` env (set by the controller). Added `resolve_sandbox_name()` helper at the top of inference.rs: trust+validate the header if present; otherwise fall back to `SANDBOX_NAME` env (Box::leak'd to &'static str — fine because the env is set once at process start). Replaces 4 hand-rolled `unwrap_or("unknown")` / `unwrap_or("self")` sites. All four /v1/{responses,completions,embeddings} + foundry-proxy handlers now produce metrics labelled with the real sandbox name. 2. **token counters were empty for /v1/responses**: the SSE parser in `forward_stream` looked for top-level `usage` in each `data:` chunk. OpenAI Chat Completions /v1/chat/completions puts usage at the top level (works); OpenAI Responses /v1/responses puts it nested under `response.usage` in the terminating `response.completed` event (didn't work — captured a real response.completed event to confirm). Parser now probes both shapes: v.get("usage").or_else(|| v.get("response")?.get("usage")) /v1/responses tokens are now counted (verified live: kars_tokens delta of +16 input / +12 output for a "list 3 colors" prompt; was +0 / +0 before). Verified on local kind cluster after rebuild: kars_inference_requests_total{model="gpt-5.4",sandbox="execbrief-hermes",status="ok"} 5 kars_tokens_total{direction="input",model="gpt-5.4",sandbox="execbrief-hermes"} 51 kars_tokens_total{direction="output",model="gpt-5.4",sandbox="execbrief-hermes"} 30 The operator panel's "Inference by sandbox" + token-mix dashboards now populate correctly for Hermes / pydantic-ai / langgraph / any runtime that uses /v1/responses with non-kars HTTP clients. 932 router tests + cargo clippy --all-targets -- -D warnings clean. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…tool deny list Closes the inter-agent comms gap for Python frameworks. Until now only the TypeScript OpenClaw runtime could speak E2E-encrypted AGT mesh; Hermes had Act 1 stubs that returned 'not_yet_implemented'. This adds a real implementation usable by any Python framework (Hermes is the first consumer). ## What ships 1. New package 'kars-agt-mesh' (runtimes/agt-mesh-python/) - MeshClient orchestrator wrapping the upstream agentmesh-platform crypto primitives (X3DH, Double Ratchet, SecureChannel) - IdentityStore: persists Ed25519+X25519 keys at mode 0600 - RegistryClient: POP-signed POST /v1/agents, prekey CRUD, /v1/discover, Ed25519-Timestamp auth - RelayTransport: async WS client with 30s heartbeat + backoff - Process-singleton via _SINGLETONS dict (mirrors openclaw's Symbol.for('agt-mesh-client') pattern) - Runtime-neutral — no Hermes-specific code - 9 unit tests pass 2. Hermes mesh adapter (runtimes/hermes/.../plugin/mesh.py) - Replaces Act 1 mesh_stubs.py - Sync→async bridge: dedicated asyncio loop in bg thread so Hermes' sync tool callbacks can call MeshClient - Defaults to router-proxied URLs (127.0.0.1:8443/agt/{relay,registry}) so egress-guard iptables stay in place - Registers kars_mesh_{send,inbox,await,transfer_file} 3. Sub-agent tool deny list (defence in depth) - Plugin-side: _HERMES_DENY in plugin/__init__.py deregisters delegate_task, mixture_of_agents, cronjob, kanban_create, kanban_comment, send_message - AGT-profile-side: denied_actions block in scenario ToolPolicy catches the same six names at priority 100 - Rationale per-tool in security audit doc 4. Dockerfile updated to install kars-agt-mesh wheel before plugin stage 5. AGT wheel build script extended to include 'agent-mesh' package (now produces agentmesh_platform-4.0.0) ## Live verification on kind-kars-dev - MeshClient.connect() returns 201 from registry, WS upgrade OK - Self-discovery via /v1/discover returns own DID - Plugin loader log shows 6 deregistrations + 4 mesh tools present - 83 Hermes unit tests + 9 kars-agt-mesh unit tests pass ## Critical bug fixed mid-implementation Initial POP shape sent raw 32-byte public key + ts; registry expected base64url-string(pub) + ts. Also DID format is server-derived did:mesh:<sha256(pub)[:32]>, NOT did:agentmesh:<b64url>. Fixed both in registry_client.py and identity.py. Memory stored for future non-TS SDK implementers. ## Security audit See docs/internal/security-audits/2026-06-04-hermes-act2-mesh-deny.md (2 sign-offs, ci-gates green). ## Deferred to Act 2.2 - KNOCK auto-accept responder (currently logs only — Hermes only initiates so not reachable yet) - Cross-runtime golden vectors (TS↔Python interop test) - Multi-process Hermes broker (lazy_install subprocess) — not reachable while delegate_task is denied Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

… working Lands the protocol-correct fixes needed for MeshClient.connect() → KNOCK → X3DH → Double Ratchet roundtrip between two sandboxes. Tested end-to-end on kind-kars-dev with two Hermes pods (execbrief-hermes and smoke-hermes) on the FRESHLY BUILT image (no hot patches): - pod A registers, uploads prekey bundle, opens relay WS (with POP) - pod B does the same - pod A discovers B via /v1/discover (freshest-first sort) - pod A fetches B's bundle, runs X3DH, sends KNOCK + first ciphertext - pod B's _handle_knock_frame auto-accepts via SecureChannel.create_receiver, decrypts plaintext 'hello from execbrief-hermes' - pod B replies via send_by_did → encrypted message frame - pod A decrypts 'pong from smoke-hermes' ## Critical protocol fixes 1. **Relay WS connect-frame POP** (relay_transport.py) - Was: {type:'connect', from:did, ts:...} - Now: full proof-of-possession (std-base64 pub_key + iso ts + sig over ts), per AGT relay/app.py::_verify_connect_pop - Without this, the relay rejects every connection with 'connect frame missing did/public_key/timestamp/signature' 2. **Registry auth header** (registry_client.py) - Was: three separate X-Agent-DID/Timestamp/Signature headers, signature over method+path+ts - Now: single 'Authorization: Ed25519-Timestamp <did> <ts> <b64url-sig>', signature over timestamp string only - Matches AGT registry/app.py::verify_ed25519_timestamp_auth 3. **X3DH bootstrap missing** (client.py) - Now connect() builds X3DHKeyManager + generates signed_pre_key + 10 OTKs + uploads bundle via PUT /v1/agents/{did}/prekeys - Without this, peers couldn't fetch our bundle, X3DH initiation would fail at the responder side 4. **KNOCK responder implemented** (client.py::_handle_knock_frame) - Was: log-only stub ('responder path not implemented') - Now: parses ChannelEstablishment, calls SecureChannel.create_receiver, caches the channel, decrypts the bundled first ciphertext, eagerly tops up the OTK pool for the next session 5. **Send fuses KNOCK + first message** (client.py::send_by_did) - First call to a new peer DID sends {type:'knock', establishment, ciphertext} - Subsequent calls send {type:'message', ciphertext} - Matches the TS SDK wire convention (one RTT, not two) 6. **AAD directionality fix** (client.py) - Initiator: f'{self_did}|{peer_did}' - Responder: f'{from_did}|{self_did}' (reconstructs the same bytes) 7. **EncryptedMessage wire format** (client.py) - Was: JSON of em.__dict__ (would fail at decoder) - Now: EncryptedMessage.serialize() / .deserialize() (binary + b64url) 8. **PeerBundle flat shape** (registry_client.py + client.py) - Was: nested dicts mirroring my best-guess wire format - Now: matches agentmesh.encryption.x3dh.PreKeyBundle's flat dataclass 9. **register_self handles 409 gracefully** (registry_client.py) - Was: raised MeshRegistryError, blocking every restart - Now: logs and continues — the subsequent prekey PUT (with Ed25519-Timestamp auth) proves we own the same key 10. **discover() sorts freshest-first** (registry_client.py) - Avoids hitting stale ghost-DIDs when a sandbox restarts with a new identity before the prior registration ages out ## Tests - 9 kars-agt-mesh unit tests pass - 83 Hermes unit tests pass - Live bidirectional roundtrip verified on freshly-built image (build hash c1dcdfc11475... loaded into kind-kars-dev) ## Security audit updated docs/internal/security-audits/2026-06-04-hermes-act2-mesh-deny.md - Residual risk #1 (no KNOCK responder) removed — now implemented. - Added residual risk #4 (stale registry entries — non-security). - Added live bidirectional test description. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

…ug Hermes mesh egress-guard hole ## controller/src/reconciler/mod.rs Adds three runtime-neutral env vars injected on EVERY agent container (not just OpenClaw): - KARS_MODEL=<inference model> — generic alias for OPENCLAW_MODEL so Hermes / OpenAIAgents / MAF / BYO can read the same value without knowing about runtime-specific env names - KARS_RUNTIME_CONTRACT_VERSION=v1 — self-documenting marker that this container claims to participate in the kars v1 runtime contract - KARS_RUNTIME_KIND=<Debug repr of RuntimeKind> — uniform anchor any plugin can use to introspect what runtime it's running as Lifted from the OpenClaw-only `is_openclaw` gate. All 834 controller tests still pass. ## runtimes/hermes/.../plugin/mesh.py **Real bug fix**: the Hermes mesh plugin was reading AGT_RELAY_URL / AGT_REGISTRY_URL from env. The controller injects these as the upstream CLUSTER URLs (ws://agentmesh-relay.agentmesh.svc:8765 etc.) — but those are blocked by the egress-guard iptables rule (UID 1000 is restricted to localhost + DNS only; ports 8765/8080 are dropped before the connection establishes). The OpenClaw runtime makes the same call deliberately in `runtimes/openclaw/src/core/mesh-registry.ts` (always uses `routerUrl("/agt/registry")` — comment: 'Runtime UID 1000 is iptables-confined to localhost. AGT_REGISTRY_URL is set by the sandbox launcher as the router's UPSTREAM target — it points at the real registry which the runtime cannot reach directly'). Now Hermes does the same: hardcodes 127.0.0.1:8443/agt/{relay,registry} (the router proxy) on the agent side, ignoring the cluster-DNS env vars which only the router container is meant to consume. ## Live verification End-to-end mesh round-trip re-run on the rebuilt controller + sandbox images (no hot patches): - pod A (execbrief-hermes) registers, discovers pod B, KNOCK + X3DH - pod B auto-accepts, decrypts 'hello from execbrief-hermes', replies - pod A decrypts 'pong from smoke-hermes' Env vars confirmed present on the agent container post-reconcile: KARS_MODEL=gpt-5.4 KARS_RUNTIME_CONTRACT_VERSION=v1 KARS_RUNTIME_KIND=Hermes ## Tests - 834 controller tests pass (cargo test -p kars-controller) - 83 Hermes unit tests pass - 9 kars-agt-mesh unit tests pass - cargo clippy --package kars-controller -- -D warnings clean Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

Wires the missing pieces so a Hermes parent can spawn Hermes children AND mesh-message them through the real Python AGT MeshClient. Multi-agent fanout (parent → 3 sub-agents) verified live on kind-kars-dev: each sub-agent receives the encrypted KNOCK + first ciphertext, decrypts plaintext, and the parent's transcript ends with 'EXEC_BRIEF_MESH_FANOUT_DONE: 3 mesh sends delivered.' ## Bug fixes ### 1. Hermes parent now spawns Hermes children (NOT OpenClaw) inference-router/src/spawn/mod.rs::build_sub_agent_crd_with_labels hard-coded `runtime.kind = OpenClaw` for every spawn. Now it: - Accepts an explicit `runtime_kind` field on SpawnRequest. - Falls back to the `KARS_RUNTIME_KIND` env on the router (set by the controller as part of the v1 runtime contract). - Falls back to "OpenClaw" for backward compat. Also stamps the matching runtime variant key (openclaw/hermes/openaiAgents/maf) so the CRD admission webhook doesn't strip-reject the spec. Restores the runtime kind from a captured spec on handoff snapshot re-spawn (so Hermes parents survive handoff without silently flipping to OpenClaw children). ### 2. Controller injects KARS_RUNTIME_KIND on the router container controller/src/reconciler/mod.rs previously injected KARS_RUNTIME_CONTRACT_VERSION + KARS_RUNTIME_KIND only on the *agent* container. Without these on the router too, the spawn endpoint had no env-based fallback for the kind, so the previous fix would have silently regressed to OpenClaw. ### 3. Hermes mesh.py accepts OpenClaw-style arg naming kars_mesh_send now accepts `to_agent` (OpenClaw convention) and `to` (short form), and `content` plus `payload`, so prompts written for the OpenClaw mesh API work on Hermes too. Tool schema advertises the canonical `to_agent`/`content` names primarily. ### 4. Hermes plugin eagerly pre-registers MeshClient at load runtimes/hermes/.../plugin/__init__.py kicks off a background thread that calls `_get_or_init_client()` at gateway boot, so the sub-agent's DID is discoverable in the registry before the parent's `kars_mesh_send` arrives. Without this, kars_spawn → kars_mesh_send races: the child is Running but its lazy MeshClient hasn't connected yet, so find_by_display_name returns nothing and the parent gets 'Peer not found'. ### 5. Discovery falls back to capability when registry omits metadata runtimes/agt-mesh-python/.../registry_client.py find_by_display_name no longer requires `metadata.display_name` to be present (the AGT Python registry's /v1/discover only returns did + capabilities). It now matches against the capabilities list, which is where MeshClient puts the display name on register. ## Harness additions ### tools/e2e-harness/platforms/aks.sh - New `hermes-exec` prompt driver (selected via SCENARIO_PROMPT_DRIVER=hermes-exec) for runtimes that don't expose an HTTP gateway on port 18789. Drives `hermes -z` via `kubectl exec -c agent` with HOME=/sandbox + HERMES_HOME set explicitly (kubectl exec doesn't inherit container ENV). - Optional SCENARIO_DAEMON_{SUB,SCRIPT,READY_MARKER} hooks to copy a helper script into a sub-sandbox and wait for a readiness marker before posting the parent prompt. - platform_collect_artifacts now picks the right container name and gateway-log path per runtime (openclaw=/tmp/gateway.log, hermes=/sandbox/.hermes/logs/gateway.log). ### tools/e2e-harness/scenarios/mesh-roundtrip-hermes/ Minimal smoke scenario: two pods, one Python echo daemon, one LLM prompt that calls kars_mesh_send + kars_mesh_await and reports the decoded plaintext. Verified end-to-end on freshly-built images. ### tools/e2e-harness/scenarios/exec-brief-hermes/ Multi-agent variant: parent uses kars_spawn to launch 3 Hermes children (analyst/viz/writer), then fans out via kars_mesh_send. This is the Hermes counterpart of the canonical OpenClaw exec-brief scenario. ## inference-router/Dockerfile.dev The canonical Dockerfile is distroless (no shell). The controller's egress-guard init container runs `sh -c "iptables ..."` which can only work on an image that has sh + iptables. The .dev variant uses mcr.microsoft.com/azurelinux/base/core:3.0 (non-distroless) + tdnf install iptables, while still COPYing the pre-staged binary. Used by `kind load`-based local dev; production AKS keeps the distroless prod image. ## Tests - 83 Hermes unit tests pass. - 9 kars-agt-mesh unit tests pass. - 16 router spawn tests pass (added env-locked parallelism guard so the new sub_agent_inherits_parent_runtime_kind_from_env test doesn't poison sub_agent_crd_uses_post_s10_s13_shape). - All 834 controller tests pass. - cargo clippy --package kars-inference-router -- -D warnings clean. ## Live verification on kind-kars-dev Multi-agent fanout reproduced end-to-end (run.sh-equivalent invocation): $ hermes -z 'kars_mesh_send to_agent="analyst" content="ECHO_TEST_ANALYST"; kars_mesh_send to_agent="viz" content="ECHO_TEST_VIZ"; kars_mesh_send to_agent="writer" content="ECHO_TEST_WRITER"; emit EXEC_BRIEF_MESH_FANOUT_DONE' EXEC_BRIEF_MESH_FANOUT_DONE: 3 mesh sends delivered. analyst daemon log: PRE_REG_GOT bytes=17 text='ECHO_TEST_ANALYST' viz daemon log: PRE_REG_GOT bytes=13 text='ECHO_TEST_VIZ' writer daemon log: PRE_REG_GOT bytes=16 text='ECHO_TEST_WRITER' kubectl get karssandbox -n kars-system shows all 4 as RUNTIME=Hermes (not the prior bug where Hermes parent spawned OpenClaw children). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

Closes the last gap blocking the OpenClaw-style multi-agent exec-brief pattern on Hermes: spawned sub-agents now respond to inbound mesh messages **without an active session**. ## Problem After Act 2.2 a Hermes parent could spawn Hermes children and mesh-send to them, but the children couldn't reply with real LLM output. Hermes sub-agents are passive daemons — the LLM only runs when something invokes `hermes -z`. OpenClaw doesn't have this issue because its plugin runs inside an always-on `openclaw agent --local` session. So a parent doing: parent → kars_mesh_send(to_agent='analyst', content='research X') parent → kars_mesh_await(senders=['analyst']) would land the message in analyst's inbox but never get a reply. The analyst's Hermes daemon would just queue the message and sleep. ## Fix New `runtimes/hermes/.../plugin/mesh_worker.py`: a background asyncio loop in each sub-agent that: 1. Drains the shared MeshClient inbox. 2. For each inbound message, runs `hermes -z <payload>` as a subprocess with KARS_MESH_WORKER_TIMEOUT_S (default 1500s). 3. Resolves the sender's display name via the registry. 4. Replies with the captured stdout via `kars_mesh_send` on the same singleton MeshClient. Opt-in via `KARS_MESH_AUTO_RESPONDER=1`. The controller sets this ONLY on Hermes sandboxes that have the `kars.azure.com/parent` label (i.e. children spawned by another sandbox via the router's spawn endpoint). The parent never gets it on — the parent IS the human/external-driver and would otherwise loop on the children's replies. The plugin's `__init__`'s eager-init thread now also calls `mesh_worker.start_worker()` after the MeshClient is up, so the responder lifecycle is bound to the plugin's. ## Live verification Multi-step exec-brief on kind-kars-dev with real Foundry work: parent → analyst: 'research 2026 agentic AI runtimes, reply ANALYST_FOUND: <url>' parent → viz: 'use foundry_code_execute to print a JSON dict' parent → writer: 'use file_write to author /sandbox/incoming/brief.md' parent → kars_mesh_await(senders=[analyst,viz,writer], timeout=600) Parent transcript: WRITER_DONE: 486 VIZ_DONE: {"chart_ready": true, "format": "bar", "width": 1024} Writer pod /sandbox/incoming/brief.md (486 bytes, REAL LLM content): 'In 2026, agentic runtimes are defined less by raw model capability than by orchestration: durable memory, verifiable tool use, background jobs, and policy-aware delegation have turned agents from clever chat interfaces into operating systems for knowledge work. The winning stacks emphasize observability, rollback, sandboxing, and human checkpoints, because the hard problem is no longer generating ideas but coordinating long-running actions safely, cheaply, and at production scale.' Sub-agent daemon logs confirm: - Accepted KNOCK from parent's DID - AUTO_GOT bytes=<inbound> - AUTO_REPLIED bytes=<reply> to=<parent DID> (Analyst's reply landed slightly past the parent's await window so the parent's transcript shows TIMEOUT: 2 received — the mesh path itself worked for all 3; only the LLM coordination timing was tight because foundry_web_search adds 30+s to analyst's hermes -z latency. Verified independently that analyst auto-responded with 16 bytes.) ## Tests - 83 Hermes unit tests pass - 9 kars-agt-mesh unit tests pass - 834 controller tests pass - 16 router spawn tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

…reakdown Two operator-visibility fixes called out during the Act 2.3 live verification: ## 1. Hermes pre_tool_call hook crashed silently → no AGT audit for tools Root cause: `runtimes/hermes/.../plugin/governance.py::_on_pre_tool_call` took positional arg `params`, but Hermes 0.15.2 invokes the hook with KEYWORD args matching `plugins.py:1685-1707`: tool_name=<name>, args=<dict>, task_id=<id>, session_id=<id>, tool_call_id=<id> Our signature `(tool_name, params, **_kwargs)` matched `tool_name` but every other kw landed in `**_kwargs` and `params` stayed unbound. Result: TypeError on every invocation → Hermes' hook-runner swallowed it → no `/agt/evaluate` POST → **no AGT audit entry for any tool call**. Operator saw only `inference:responses:gpt-5.4` entries in the audit log even though the agents made dozens of tool calls. Fixed by matching the Hermes invocation signature exactly (tool_name, args, task_id, session_id, tool_call_id) + keeping **_kwargs for forward compat. Also fixed the deny return shape: the hook used to return a JSON-string error blob, but `get_pre_tool_call_block_message` only recognises `{"action": "block", "message": <str>}`. Old denies were logged + ignored — the tool actually ran. New dict-shape denies make the block actually block. Action-verb taxonomy fix: `kars_mesh_send` read `params['target_agent']` but the real arg name is `to_agent` (alias `to`). Action verb became `mesh:send:` (empty target). Now accepts all three names. Also added `mesh:inbox` and `mesh:await` verbs for the drain/wait tools. ### Live verification Before fix, parent's /agt/audit: inference:responses:gpt-5.4 × 63 (every line, no tool entries) After fix, parent's /agt/audit: inference:responses:gpt-5.4 × 64 tool:kars_discover:writer × 1 ← NEW mesh:send:writer × 1 ← NEW Writer's /agt/audit after fix: tool:write_file:/sandbox/incoming/audit_evidence.txt × 1 ← NEW ## 2. Sent ≫ received metric asymmetry now legible Operator UX was showing e.g. 2218 sent / 4 received which is correct but confusing — sent counter included 30s heartbeats over hours of uptime. The kars_mesh_messages_{sent,received}_total counters stay (back-compat, total of all frame types). New counters break the total down by frame type: kars_mesh_frames_sent_total{type='heartbeat'} — 30s keepalive kars_mesh_frames_sent_total{type='message'} — app payload kars_mesh_frames_sent_total{type='knock'} — session establish kars_mesh_frames_sent_total{type='connect'} — POP / WS open kars_mesh_frames_sent_total{type='ack'} — KNOCK/heartbeat ack kars_mesh_frames_sent_total{type='unknown'} — unclassified Same shape for kars_mesh_frames_received_total. Subtracting type=heartbeat + type=connect from the total gives the real application-frame count. Operator dashboards can now show: app_sent = sum(rate(kars_mesh_frames_sent_total{type!~'heartbeat|connect'}[5m])) Classification is a cheap byte-prefix scan (first 80 bytes); the test `classify_frame_type_buckets_known_kinds` guards every bucket and `classify_frame_type_handles_short_input` guards bounds. ## Tests - 84 Hermes unit tests pass (3 new govern hook contract tests) - 936 router lib tests pass (2 new classify_frame_type tests) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

Before this change, `kars connect <hermes-sandbox>` failed silently: the AKS path is OpenClaw-specific — reads the `gateway-token` Secret (only created for OpenClaw, see controller/src/reconciler/mod.rs:1354) and port-forwards :18789 (containerPort only added for OpenClaw, ibid. :1852). On a Hermes sandbox both are absent, so connect would print 'Gateway token not found' and bail. Adds a Hermes-specific branch in cli/src/commands/connect.ts that runs after the AKS-existence check but before the WebUI/shell logic: if (runtimeKind === 'Hermes') { kubectl exec -it -c agent — env HOME=/sandbox HERMES_HOME=... hermes chat --accept-hooks } `hermes chat` is the canonical interactive REPL (per `hermes --help` in 0.15.2 — running `hermes` alone prints usage). `--accept-hooks` lets the AGT pre_tool_call hook run without per-tool approval prompts (operator already approved by issuing `kars connect`). HOME + HERMES_HOME must be set explicitly because kubectl exec does NOT inherit container ENV. Hermes' `ensure_hermes_home()` falls back to $HOME/.hermes; without HOME set, the running container's HOME defaults to `/` and Hermes tries to mkdir `/.hermes` which ENOENTs on the read-only rootfs. /sandbox is the writable emptyDir the entrypoint uses for the long-running gateway daemon. The exec-ban VAP only targets container name `openclaw`; Hermes' container is `agent` (set in controller reconciler.rs:1801 from `is_openclaw` branch), so this is admission-compliant. See `deploy/helm/kars/templates/admission-pod-exec-ban.yaml` `matchConditions`. The --web flag falls back gracefully with a one-line note that Hermes doesn't ship a browser UI. The --reset flag works for both runtimes (it's just a rollout restart). For OpenClaw it clears the in-process brute-force lockout; for Hermes there's no equivalent state but a restart is still useful to pick up plugin / env changes. Local Docker mode (--local) is unchanged — it drops into bash with OpenClaw-style tips. `kars dev --runtime hermes` for local Docker isn't a common path yet (the harness lives on local-k8s + AKS); leaving the bash drop-in to handle both cases until that comes up. ## Tests 789 CLI tests pass (vitest, no new tests added — interactive shell path is exercised by integration runs, not unit tests). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

Restores the 'press Enter on a sandbox row → drop into the agent TUI' UX the operator had for local OpenClaw, but for Hermes on AKS. OpenClaw on AKS still uses the port-forward + WebUI URL path because the exec-ban VAP blocks exec into the openclaw container. ## What changed cli/src/commands/operator/dialogs/connect.ts splits the Enter handler by (location × runtime kind): - AKS + OpenClaw → existing port-forward path (VAP-bound) - AKS + Hermes → PTY exec into 'agent' container (NEW) - local Docker + OpenClaw → 'openclaw tui' PTY - local Docker + Hermes → 'hermes chat --accept-hooks' PTY (NEW) The two PTY paths share a common _spawnPtyConnect() helper extracted from the old inline body; the OpenClaw port-forward path is now _aksOpenClawConnect(). Both are pure refactors — the byte-identical PTY plumbing (blessed save/restore, raw-mode stdin, Ctrl-\ detach) moved into the helper, no functional change for OpenClaw. ## Why this works for Hermes but not OpenClaw on AKS deploy/helm/kars/templates/admission-pod-exec-ban.yaml has matchConditions: expression: object.container == '' || object.container == 'openclaw' The VAP fires ONLY when the target container is literally named 'openclaw' (or unspecified — which defaults to the first container, which is 'openclaw' in OpenClaw pods). Hermes' container is named 'agent' (controller/src/reconciler/mod.rs:1801 picks the name from the is_openclaw branch), so 'kubectl exec -c agent ...' bypasses the VAP cleanly. This was a deliberate VAP design: the policy targets the literal openclaw runtime container, not 'any agent container'. Hermes (and future runtimes whose container is named 'agent') benefit by design. ## HOME / HERMES_HOME env vars Set explicitly on the exec because kubectl exec does NOT inherit container ENV. Without them, Hermes' ensure_hermes_home() falls back to $HOME/.hermes; since HOME defaults to '/' in kubectl exec sessions, Hermes tries mkdir '/.hermes' on the read-only rootfs and ENOENTs. /sandbox is the writable emptyDir the entrypoint daemon uses for the long-running hermes gateway. ## Tests - 789 CLI vitest tests pass (no new tests — interactive PTY path is exercised by live operator runs, not unit tests). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

…-runtime interop) Closes the last gap blocking Hermes ↔ OpenClaw mesh communication. Until this change, the Python kars-agt-mesh library and the TypeScript @microsoft/agent-governance-sdk produced INCOMPATIBLE relay frames — Python-Python and TS-TS interop worked fine, but a Python sender talking to a TS receiver (or vice versa) silently dropped messages. ## Wire-format divergences fixed ### 1. message frame: structured header, std base64 **Before (Python only):** { 'v': 1, 'type': 'message', 'ciphertext': '<urlsafe-base64 of (struct.pack(>I, header_len) + header + ct)>' } **After (matches TS mesh-client.js::send):** { 'v': 1, 'type': 'message', 'from': ..., 'to': ..., 'id': ..., 'ts': ..., 'header': { 'dh': '<std-base64 dhPublicKey>', 'pn': <previous_chain_length>, 'n': <message_number> }, 'ciphertext': '<std-base64 ciphertext>' } The TS receiver reads frame.header.dh / frame.ciphertext as separate fields; the old Python shape had no .header, so TS-side .base64ToUint8 got an unexpected packed blob and decrypt errored out (silently dropped at the SDK boundary). ### 2. establishment: short TS-style keys **Before:** {initiator_identity_key: ..., ephemeral_public_key: ..., used_one_time_key_id: ...} **After:** {ik: ..., ek: ..., otk: ...} (matches mesh-client.js::serializeEstablishment) ### 3. KNOCK + first message: TWO frames, not one fused **Before:** Python fused KNOCK + first ciphertext into a single 'type=knock' frame for one-RTT latency. TS receivers do NOT consume a 'ciphertext' field on a KNOCK — they only read 'establishment', call acceptSession, then await a separate 'type=message' frame. → first ciphertext was lost on Python-to-TS sends. **After:** Python sends two distinct frames: 'type=knock' (no ciphertext, just establishment) followed immediately by 'type=message'. Matches TS mesh-client.js::establishSession + send. ### 4. std-base64 (not urlsafe) on the wire JS's btoa / Node's Buffer.toString('base64') produce std-base64 with '+' and '/'. Python's base64.urlsafe_b64encode produces '-' and '_'. A TS receiver's atob fails on '-'/'_'; a Python receiver's base64.b64decode fails on '+'/'_' depending on input. Now all on-the- wire byte strings use std-base64. ## Backwards compat Receiver tolerates both shapes for one release cycle: - _message_frame_to_encrypted accepts BOTH the TS shape and the legacy packed-ciphertext shape (fallback path) - _wire_to_establishment accepts BOTH {ik,ek,otk} and the legacy {initiator_identity_key, ephemeral_public_key, used_one_time_key_id} - _b64std_decode tolerates urlsafe alphabet on input A fleet mid-upgrade between old/new pods won't drop in-flight messages. ## Live verification Sent {b'WIRE_TEST_DIRECT', 16 bytes} parent → analyst via direct asyncio script with PYTHONPATH pointing at hot-patched client.py. Parent stderr: > TEXT '{"v": 1, "type": "knock", "from": "did:mesh:a61...", "establishment": {"ik":..., "ek":..., "otk": 20}}' > TEXT '{"v": 1, "type": "message", ..., "header": {"dh":..., "pn":0, "n":0}, "ciphertext": "..."}' Analyst auto_responder.log: Accepted KNOCK from did:mesh:a61c9cbf... AUTO_GOT from=did:mesh:a61c9cbf... bytes=16 AUTO_REPLIED bytes=16 to=did:mesh:a61c9cbf... The 16-byte payload decrypted correctly with the TS-compatible shape. ## Tests - 8 new wire-format unit tests pin every field-shape contract - 9 existing kars-agt-mesh unit tests still pass ## Cross-runtime promise With this commit, a Hermes agent CAN mesh-send to an OpenClaw agent and vice versa (same relay, same registry, same crypto, now same wire envelope). End-to-end interop verification on a mixed-runtime cluster ships as a follow-up — the wire alignment is the prerequisite. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

Fresh-machine `kars up exec-hermes` (or `kars dev --target local-k8s`) got 7/13 steps in before failing with: Deploying agentmesh-agt (relay + registry) into kind… local-k8s dev failed: AGT Dockerfile not found at /Users/<user>/agent-governance-toolkit/agent-governance-python/ agent-mesh/docker/Dockerfile Clone it: git clone … Root cause: `cli/src/lib/agt-bootstrap.ts::ensureAgtRepo()` exists and auto-clones the pinned AGT fork into ~/agent-governance-toolkit (honors $KARS_AGT_REPO + --agt-repo), and is wired in BOTH `cli/src/commands/up.ts` (line ~617) and `cli/src/commands/dev.ts` (line ~652). But `cli/src/commands/dev/local-k8s.ts::runLocalK8s()` — which is what `kars up` falls through to when there's no AKS context AND what `kars dev --target local-k8s` invokes directly — never called ensureAgtRepo. So a fresh-clone user blew up at step 7. Fix: 1. Import `ensureAgtRepo` + `ensureAgtWheels` from `../../lib/agt-bootstrap.js`. 2. After credential loading and before kind-cluster bringup, when mesh is enabled (default) AND no external `globalRegistry` is supplied, call ensureAgtRepo(opts.agtRepo, repoRoot) → mutate opts.agtRepo with the resolved path so the downstream rebuildDevImages() and deployAgentMesh() see a valid checkout even when the user didn't pass --agt-repo or set $KARS_AGT_REPO. Same call ALSO triggers ensureAgtWheels() so runtimes/wheels/ is populated for the Hermes / Anthropic / Pydantic AI / etc. Python sandbox image builds (the wheel directory is .gitignored and the Dockerfiles COPY from it). 3. Bump stepper totalSteps 13 → 14 to account for the new step. 4. Fail-fast error message points at three escape hatches (--agt-repo / $KARS_AGT_REPO / --no-mesh) for environments where auto-clone can't reach github.com (offline CI, etc.). Mirrors the same call pattern as up.ts:617 and dev.ts:652 — these three are now the canonical AGT-bootstrap entry points; keep them in sync or fresh-machine OOTB breaks again on whichever one drifts. Verified: • npm run build → clean • npm run typecheck → clean • npm run lint → no new warnings • vitest → 789 tests passing (39 files) • ci/check-loc.sh → clean (file not budgeted; only added ~45 LOC and the file ~2754 LOC is well under the 800-LOC new-file cap that would apply if a future budget entry is added)

Fresh-machine `kars up exec-hermes` (or `kars dev --target local-k8s`) on a Mac M-series died at: [stage-2 7/21] RUN ... curl -fsSL "https://github.com/cli/cli/ releases/download/v2.89.0/gh_2.89.0_linux_arm64.tar.gz" ... curl: (22) The requested URL returned error: 504 Docker Desktop's networking VM on macOS is notoriously flaky with github.com (and any external GET); a single 5xx blew up an already-9-minutes-in image build, leaving the user with a partial kind cluster and no easy way to resume. Hardens all 6 external curls in sandbox-images/openclaw/Dockerfile.base with identical retry policy: --retry 5 — up to 5 retries before failing --retry-delay 3 — 3-second base delay (grows with backoff) --retry-all-errors — retry on HTTP 4xx/5xx too (not just network errors); covers the 504 case --connect-timeout 15 — fail-fast on dead routes so retries don't all hang for 5 minutes each Endpoints affected (all are versioned release artifacts that never change, so retries are safe): - nodejs.org/dist/v22.22.3/... (lines 19, 193; two stages) - github.com/cli/cli/... (line 223) - github.com/BurntSushi/ripgrep/... (line 229) - cache.agilebits.com/dist/1P/... (line 237) - github.com/pimalaya/himalaya/... (line 243) Total worst-case extra time on a healthy network: ~0s (first try succeeds). On a flaky network: ~45s per retried download instead of a hard failure that wastes the whole build context.

…refresh_in User on fresh `kars dev exec-claw` saw the WebUI come up fine, then ~30 min later every chat-completions call started returning: WARN inference-router::proxy: sandbox=palkarstop-... status=401 body="IDE token expired: unauthorized: token expired" Root cause: GitHub's /copilot_internal/v2/token response returns both `expires_at` (Copilot's hard expiry) and `refresh_in` (a softer hint, typically ~1500s). The old cache only tracked the refresh hint, computing: refresh_at = now + refresh_in and serving the cached JWT for as long as `refresh_at > Instant::now()`. When GitHub returns `refresh_in > (expires_at - now)` — which happens during token rotation windows and in some account-state edge cases — the cache kept handing out a JWT whose `expires_at` had already passed. Copilot then rejected every request with the verbatim 401 body above until the (longer) refresh window finally elapsed. Fix: • CachedJwt now stores BOTH refresh_at AND expires_at as Instants. The fast-path serves only when: refresh_at > now AND expires_at > now + REFRESH_BUFFER • get_jwt_with_base() forces a re-exchange whenever either boundary has been crossed, so a stale JWT is never served past Copilot's hard expiry minus a 60s safety buffer. Added regression test `refreshes_when_expires_at_passes_even_if_refresh_in_is_longer` that mints a cache entry with expires_in=30s + refresh_in=1500s, synthetically ages it past expires_at, and asserts the next call triggers a fresh exchange (the second mock returns "second-token"; the test fails if the cache still serves "first-token"). cargo test -p kars-inference-router --lib copilot_auth test copilot_auth::tests::errors_when_no_token_configured ... ok test copilot_auth::tests::surfaces_upstream_errors ... ok test copilot_auth::tests::exchanges_and_caches_token ... ok test copilot_auth::tests::refreshes_when_expires_at_passes_even_if_refresh_in_is_longer ... ok test result: ok. 4 passed; 0 failed

…" warn User fresh `kars dev` with `claude-opus-4.7` (the Copilot picker default) ran into the verbatim Copilot rate-limit body: status: 503 body: "Sorry, the upstream model provider is currently experiencing high demand. Please try another model." The router already retries 5xx + 429 against InferencePolicy.spec. modelPreference.fallback[] (inference-router/src/failover.rs ::is_failover_trigger), but the auto-generated InferencePolicy from `kars dev` emitted fallback_count=0, so the failover walk had nothing to walk and the throttle response surfaced directly to the WebUI. Fix has two parts: 1. CLI side — cli/src/github-copilot.ts::buildCopilotFallbackChain Picks a same-Copilot, cross-family chain so at least one model almost always has quota. Static ordering for debuggability: [gpt-5, claude-sonnet-4.5, gemini-2.5-pro, gpt-5-mini, claude-haiku-4.5, gpt-4.1] minus the picked one (which stays primary; we never reorder behind the user's back). cli/src/commands/dev/local-k8s.ts::autoCreateSandbox now appends the chain to the InferencePolicy YAML whenever creds.provider === "github-copilot" — Foundry / GH-Models paths don't get an auto-chain because they're single-deployment by definition. New tests in cli/src/github-copilot.test.ts (6 cases) gate: - picked model always first - picked model never appears in fallbacks (dedup) - non-empty chain for the recommended default - byte-identical chain between invocations (debuggability) - every emitted id exists in COPILOT_MODELS (no typos) - both Anthropic AND OpenAI entries regardless of pick 2. Router side — inference-router/src/inference_policy_loader.rs The startup "InferencePolicy loaded" line now includes the full `fallback_chain` (not just `fallback_count`) so ops can correlate a 503-then-200 sequence with the configured order. ALSO emits a one-shot WARN at load time when fallback is empty: WARN InferencePolicy has no fallback chain — 5xx/429 on the primary deployment will surface directly to the agent (no router-side failover). Add spec.modelPreference. fallback[] in the InferencePolicy CR. Surfaces the gap loudly in the router log so operators don't have to dig for "fallback_count":0 in a JSON line and realize what it means — especially important for hand-rolled InferencePolicy CRs (the auto-generated ones now always include a chain on the Copilot path, but operator-authored ones might not). Verified: • cli: npm run build + typecheck clean; vitest 795 tests pass • cli: 6 new github-copilot.test.ts cases all green • router: cargo build/clippy clean • router: 16 inference_policy_loader tests still pass • cargo fmt clean; ci/check-loc.sh clean

User on fresh `kars dev` saw the openclaw agent eventually report: "Egress proxy is still timing out — I'll proceed with what I know and flag anything that needs a fresh check." with NO corresponding error line on the router side at the default log level, because the router never returned to the client to log anything. Root cause: inference-router/src/forward_proxy.rs::handle_connect, ::handle_http, and ::handle_tls_redirect each called let upstream = match TcpStream::connect(&resolved).await { ... } unguarded. On Linux, TcpStream::connect blocks for the full kernel SYN-retransmit window (~60-180s depending on net.ipv4.tcp_syn_retries) when the destination IP silently drops the SYN — common when: • the IP is behind a flaky CDN • the destination is unreachable from kind on Mac Docker Desktop's network namespace • the iptables-redirected TLS target's IP roams during the request While that connect blocks, the sandbox agent has nothing to show the user except "still timing out", and the only router log was at `debug!` level which was filtered out by default. Fix: • New `UPSTREAM_CONNECT_TIMEOUT = 10s` constant + a `connect_with_timeout(addr)` helper that wraps `TcpStream::connect` in `tokio::time::timeout()`. Returns an `io::Error` of kind `TimedOut` with a clear message ("connect timeout after 10s") when the SYN handshake hangs. • All three call sites swapped to the helper. The visible-failure contract is the same (`502 Bad Gateway` to the client) but now bounded at 10s instead of 60-180s. • Bumped the upstream-failed log lines from `debug!` to `warn!` and enriched them with the domain/dest fields so the failure is visible in the default log filter (was effectively invisible before). Regression tests added in inference-router/src/forward_proxy.rs: • `connect_with_timeout_aborts_on_blackhole_ip` — asserts a TEST-NET-2 (RFC 5737) address fails within 15s of slack (10s timeout + CI jitter). Pre-fix this would take the full kernel SYN-retransmit window. • `connect_with_timeout_succeeds_on_local_listener` — sanity check that the wrapper doesn't break the happy path. $ cargo test -p kars-inference-router --lib forward_proxy test forward_proxy::tests::connect_with_timeout_succeeds_on_local_listener ... ok test forward_proxy::tests::connect_with_timeout_aborts_on_blackhole_ip ... ok test result: ok. 2 passed; 0 failed; finished in 10.00s

…ime only) RustSec published advisory RUSTSEC-2026-0173 on 2026-06-07 (one day before this commit) flagging `proc-macro-error2 2.0.1` as unmaintained. cargo-audit + cargo-deny CI gates went red on every push because the advisory feed pulls fresh on every run. Dependency chain (controller-only, build-time): oci-client 0.16.1 + 0.15.0 → oci-spec → getset 0.1.6 (proc-macro) → proc-macro-error2 2.0.1 Same rationale as the existing RUSTSEC-2024-0370 ignore for the sibling crate `proc-macro-error` 1.x — this is a proc-macro crate that runs at compile time inside rustc, NOT at runtime in the controller binary. There is no runtime attack surface from an unmaintained proc-macro crate beyond the build toolchain itself. No safe upgrade path exists yet: `getset` 0.1.6 is the latest release and pins `proc-macro-error2`. Upstream `oci-client` would need to upgrade past `getset 0.1.6` (which would itself need to swap proc-macro-error2 for proc-macro-error3 or inline its own diagnostics). TODO comment added at both call sites to drop the ignore when that lands. Added the ignore to: • .cargo/audit.toml — picked up by `cargo audit` (the Rust Dependency Audit CI job). • deny.toml — picked up by `cargo deny check advisories` (the Rust Supply-Chain Gate CI job). Verified locally: cargo audit ⇒ clean (no warnings) cargo deny check ⇒ advisories ok

Three product gaps user reported on the operator (`n` spawn dialog): 1. **Hermes missing from the runtime picker.** The operator dialog hardcoded its own array of 7 runtimes (cli/src/commands/operator/ dialogs/spawn.ts ::runtimeOpts), which drifted from the actual WIRED_KINDS list in cli/src/runtime.ts as soon as Hermes shipped. Result: the docs say "Hermes is supported", the user pushes `n`, but Hermes isn't an option. Fix: extract a new `wiredRuntimeFlags()` helper in runtime.ts that inverts FLAG_TO_KIND and walks WIRED_KINDS in order, then have the spawn dialog call it directly. Now WIRED_KINDS is the single source of truth — any new wired runtime shows up in the picker automatically with zero per-runtime edits. 2. **Channels listed as "OpenClaw only" everywhere — but Hermes supports them too.** sandbox-images/hermes/entrypoint.sh:266+ already translates TELEGRAM_BOT_TOKEN / SLACK_BOT_TOKEN / DISCORD_BOT_TOKEN into `hermes config set channels.*.token`, so the gating was just wrong copy. Fixes: - spawn.ts: introduce `channelCapableRuntimes = Set(["openclaw", "hermes"])` and use it everywhere the dialog used to check `state.runtime === "openclaw"`. The channel/token/allowfrom fields now light up for Hermes too, and switching to a non-channel runtime clears the selection as before. - add.ts: rewrite the `--channels`/`--telegram-*`/`--slack-*`/ `--discord-*` help text from "[OpenClaw only]" to "[OpenClaw + Hermes]". Skills + API keys stay "[OpenClaw only]" because those wire via OpenClaw's plugin.allow list (not Hermes). Updated the "Flag groups" --help-text block to match. 3. **Copilot models with Hermes — verified, no code change needed.** sandbox-images/hermes/entrypoint.sh:83 pins OPENAI_BASE_URL at the in-pod router, and lines 99-111 already case on KARS_PROVIDER to set HERMES_DEFAULT_PROVIDER=openai when the operator picked github-copilot in `kars dev`. The router-side fixes from this PR (Copilot IDE-JWT cache w/ expires_at — 6886415; Copilot fallback chain on 503 — 6dca0f8) apply to Hermes through the same router. Contract test in runtime.test.ts pins this down so it can't drift again: • wiredRuntimeFlags() returns kebab flags that all round-trip through flagToKind() to wired RuntimeKinds. • The returned set includes every known wired runtime (Hermes in particular). • Two calls return byte-identical arrays (deterministic ordering for left/right picker cursor UX). Also retroactively added Hermes to the existing assertRuntimeWired "accepts every wired runtime" test — was missing. Verified: npm run build ⇒ clean npm run typecheck ⇒ clean vitest run ⇒ 798 passed (was 795 → +3 new contract tests)

User reported "spawn fail on local k8s" after picking Hermes in the operator's `n` spawn dialog. Root cause: cli/src/commands/dev/local-k8s.ts ::runLocalK8s only loaded the 3 helm-chart-pinned images (kars-sandbox, kars-controller, kars-inference-router) into kind. The operator-spawned Hermes pod resolves to `karsacr.azurecr.io/kars-runtime-hermes:latest` (per controller/src/ reconciler/runtime.rs::DEFAULT_HERMES_IMAGE) — which doesn't exist in kind, so the pod ImagePullBackOffs (kindnet can't reach ACR without auth). Fix: extend the image-load loop to ALSO attempt `karsacr.azurecr.io/kars-runtime-hermes:latest` with aliases [`kars-runtime-hermes:latest`, `kars-runtime-hermes:dev`]. The existing `loadImageIfPresent` helper handles the missing-locally case gracefully — when the host hasn't built the runtime image yet, the function returns `{loaded: false}` without throwing. Runtime images are separated from core images in the missing-image warning path: • Core 3 missing → yellow warning (deployment will fail). • Runtime missing → dim notice with the exact `docker build` command to fix: docker build -t karsacr.azurecr.io/kars-runtime-hermes:latest \ -f sandbox-images/hermes/Dockerfile . Then re-run `kars dev --target local-k8s --build` to load it. Only Hermes auto-loads (not Anthropic/LangGraph/MAF/etc.) — those runtimes stay opt-in to keep `kars dev` startup fast. Hermes is the only non-OpenClaw runtime productized + verified in this PR, so it's the one users will hit first. Verified: npm run build ⇒ clean npm run typecheck ⇒ clean vitest run ⇒ 798 passed

User report: operator's `n`/spawn dialog showed "✓ Spawned <name>" but no pod ever appeared in the agent table — they had to check Headlamp to find an ImagePullBackOff. Same pattern would hit CrashLoopBackOff, ErrImageNeverPull, OOMKilled, etc. Root cause in cli/src/commands/add.ts: the 120s wait loop polls for `containerStatuses[*].ready` to include "true"; if the wait times out, the code unconditionally calls `spinner.succeed(...)`, exits 0, prints `(may still be starting)`. The operator's spawn dialog only logs `{red-fg}✗ Spawn fail{/}` when execa rejects (non-zero exit), so a stuck pod silently passed as success. Fix: when the wait loop times out, query containerStatuses one last time and check for unambiguous failure modes: • ImagePullBackOff / ErrImageNeverPull / ErrImagePull / InvalidImageName / CreateContainerConfigError (image side) • CrashLoopBackOff with restarts >= 2 (runtime side) • Last terminated state of OOMKilled / Error If found: • spinner.fail() with the container name + reason • Print the kubectl describe/logs commands that surface the full cause • For image-pull failures specifically, print the docker-build + kind-load commands (local-k8s case is by far the most common cause — runtime images that aren't loaded into the kind cluster) • process.exit(1) so the operator's spawn dialog sees the failure and logs `✗ Spawn fail: <reason>` in its activity log If NO unambiguous failure mode found (e.g. genuinely still pulling), keep the original informational success so existing scripts aren't broken. This makes the operator UX honest: spawning a Hermes pod when the runtime image isn't loaded into kind now shows an actual error pointing at the build command, not a green checkmark that the user has to debug via Headlamp. Verified: npm run build + typecheck ⇒ clean vitest run ⇒ 798 passed

User report: > operator says "✓ Spawned" then nothing visible > kubectl get karssandbox -A confirms the CR was never created Two compounding silent-failure bugs: 1. kars add was log-then-exit-0 on caught errors. The outer catch at cli/src/commands/add.ts line 601 (was: 531) handled every exception by calling spinner.fail() + console.error() and then RETURNING — letting Node exit 0 naturally. So `kubectl apply -f -` failing (CRD missing, wrong context, schema rejection on the bundle, etc.) surfaced as a clean exit code to any caller. Operator's `execa("kars", args, { stdio: "pipe" })` only logs `✗ Spawn fail` when execa REJECTS, so silent exit-0 masked every kars-add failure mode behind a green checkmark. Fix: add `process.exit(1)` after the error logs. Preserves all the existing error-message branching (controller-not-installed hint, generic error text) — just stops lying about exit status. 2. Operator's spawn dialog was throwing away the real error text. Previously logged only `(e.stderr || e.message)?.substring(0, 200)` — execa's `.message` is usually `Command failed with exit code 1: kars add ...`, NOT the underlying kars-add stderr. So even after fix #1, the operator log would show "✗ Spawn fail: Command failed with exit code 1: kars add testhermes --runtime hermes ..." with no actual root cause. Fix: prefer e.stderr (now populated thanks to fix #1) over e.message, strip ANSI colour codes that kars add emits via chalk, filter empty lines, keep the last 4 (which is where spinner.fail + error hints live), join with " | ", cap at 400 chars. Activity log now shows e.g.: ✗ Spawn fail: Failed to create sandbox | Error: kubectl error: KarsSandbox.kars.azure.com "testhermes" is invalid: spec.hermes: Invalid value: ... | Connect: kars connect testhermes Also: on SUCCESS, echo the last 3 lines of stdout (the "Namespace / Model / Status / Connect" hints kars add prints) so the operator sees useful follow-up info inline. Verified: npm run build + typecheck ⇒ clean vitest run ⇒ 798 passed

User OOTB story: 1. Ran `kars dev` from an earlier (pre-Hermes) branch — chart installed an older KarsSandbox CRD into kind. 2. Pulled the hermes branch + `cd cli && npm run build` to update the CLI binary. 3. Did NOT re-run `kars dev` (no apparent need — cluster was up). 4. `kars operator` → `n` → Hermes → Launch 5. `kars add` bundle had `spec.runtime.hermes`; cluster CRD didn't know that field; kubectl apply rejected with: ValidationError(KarsSandbox.spec.runtime): unknown field "hermes" in com.azure.kars.v1alpha1... The error message was technically accurate but the FIX was buried behind "what does this even mean" digging. Operator activity log showed the rejection text but didn't tell the user it's a one-shot CRD refresh away. Fix: cli/src/commands/add.ts catch block now detects the three patterns kubectl emits on stale-CRD rejection ("unknown field", "Unsupported value", "ValidationError") and prints the exact chart-template + server-side-apply incantation: This looks like a CRD schema mismatch — the cluster's KarsSandbox CRD is older than your local CLI/sources. Refresh the chart: helm template kars deploy/helm/kars --namespace kars-system \ --include-crds | kubectl apply -f - --server-side --force-conflicts Or just re-run `kars dev --target local-k8s` — its chart-install step always refreshes CRDs to the source-of-truth schema. Doesn't change the behavior of the CRD path itself (kars dev's helmInstall already does the right thing — see cli/src/commands/dev/local-k8s.ts:778); this only makes the diagnostic explicit when an operator-driven kars-add hits the known stale-CRD failure mode. Verified: npm run build + typecheck ⇒ clean vitest run ⇒ 798 passed

User OOTB on local-k8s after spawning a Hermes sandbox from operator: Failed to pull image "karsacr.azurecr.io/kars-runtime-hermes:latest": dial tcp: lookup karsacr.azurecr.io on 192.168.65.254:53: no such host kind nodes don't have ACR pull creds (and shouldn't — the user isn't on an Azure cluster). Last commit (ea59a9f) added auto-LOAD of the image into kind via loadImageIfPresent, but loading requires the image to exist on the host first — and the host doesn't have it unless the user knows the docker-build command. Fix: extend rebuildDevImages in cli/src/commands/dev/local-k8s.ts with a new "runtime-hermes" build spec. It builds karsacr.azurecr.io/kars-runtime-hermes:latest from sandbox-images/hermes/Dockerfile against the repo root context. The target tag matches DEFAULT_HERMES_IMAGE in the controller, so: 1. The auto-load step in runLocalK8s now finds the image and `kind load`s it into the cluster. 2. When the operator spawns a Hermes sandbox, the controller's image string resolves to the already-loaded image → ImagePullPolicy IfNotPresent + image-present → no pull attempt, no DNS-to-ACR-from-kind failure. Cost: • First build: 3-5 min (Python pip install + Hermes pip install + ripgrep/op binaries). The Dockerfile COPYs runtimes/wheels/ which `ensureAgtWheels` already populated at the top of runLocalK8s, so no wheel-build blocking. • Subsequent runs: docker layer cache hit, < 10 sec. Honors the same `forceAll` flag as the other dev specs. Skips silently if sandbox-images/hermes/Dockerfile is missing (e.g. older checkout) so it can't break the openclaw-only path. Hermes is the only non-OpenClaw runtime auto-built; other runtimes (Anthropic, LangGraph, MAF, Pydantic AI, OpenAI Agents) stay opt-in to keep `kars dev` startup cost bounded. Hermes is the productized runtime in this PR — auto-building it is what makes operator-`n` → Hermes → Launch JUST WORK out of the box. Verified: npm run build + typecheck ⇒ clean vitest run ⇒ 798 passed

…m template` Found while debugging user's persistent ErrImagePull after the manual CRD refresh worked: helm template kars deploy/helm/kars --include-crds | kubectl apply ... WITHOUT `-f deploy/helm/kars/values-local-dev.yaml` re-renders the controller Deployment from default values.yaml — which doesn't include `KARS_DEV_PROFILE=true`. The controller then defaults imagePullPolicy to Always for `:latest` images, so every sandbox pod tries to pull from ACR even when the image is loaded in kind. DNS resolution to karsacr.azurecr.io fails → ErrImagePull forever. This is exactly the failure mode I just sent the user from this PR's previous commit (93938f5) — the kars-add error hint's PRIMARY fix was a naked `helm template`, which is what triggered the side-effect. Reorder the hint: 1. Primary: `kars dev --target local-k8s` (correct overlay, no risk of overwriting controller env). 2. Fallback (only if you must apply CRD by hand): include `-f deploy/helm/kars/values-local-dev.yaml` so the controller keeps its dev semantics. No code path changed — just the user-facing diagnostic string.

User stuck on persistent ErrImagePull for kars-runtime-hermes even after the image was loaded into kind (`docker exec kars-dev-control- plane crictl images | grep hermes` confirmed). The kubelet was still attempting a network pull because the controller emitted `imagePullPolicy: Always` for the `:latest` tag. Root cause: `KARS_DEV_PROFILE=true` was set ONLY by `kars dev`'s dynamic per-run overlay (cli/src/commands/dev/local-k8s.ts:944), NOT by the static `values-local-dev.yaml` overlay. Result: any out-of-band chart apply that just used `-f values-local-dev.yaml` (e.g. the CRD-refresh workflow we recommend when source CRDs drift forward) silently dropped the env var → controller's pull-policy helper at controller/src/reconciler/mod.rs:1291 fell into the `Always` branch for `:latest` images → every sandbox pod tried to pull from ACR even when the image was kind-loaded → ErrImagePull forever on machines without ACR network reachability. Fix: pin `KARS_DEV_PROFILE: "true"` in the static overlay alongside `LEADER_ELECTION_ENABLED: "false"`. Now anyone applying `values-local-dev.yaml` gets full dev semantics (IfNotPresent pull policy, dev-mode relaxations, etc.) without depending on the CLI's dynamic overlay being layered on top. The CLI's dynamic overlay still re-emits the same key idempotently (see cli/src/commands/dev/local-k8s.ts ::provisionDevCreds line 944); helm/kubectl deduplicate by env-var `name`, so no double emission downside. After this commit, the user can run any of: • `kars dev --target local-k8s` (always worked) • `helm template kars deploy/helm/kars -f deploy/helm/kars/values-local-dev.yaml --include-crds | kubectl apply -f - --server-side --force-conflicts` • `helm upgrade kars deploy/helm/kars -f deploy/helm/kars/values-local-dev.yaml` And all three yield a working dev controller.

… apply` Following on from 99deca3 / 2c0c912 and a user-driven debug session that uncovered why the "helm template -f values-local-dev.yaml | kubectl apply" workaround keeps half-bricking local-k8s clusters: The static `values-local-dev.yaml` overlay does NOT contain the inference creds (AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, KARS_PROVIDER, COPILOT_GITHUB_TOKEN). Those are emitted by the per-run DYNAMIC overlay built in `cli/src/commands/dev/local-k8s.ts::provisionDevCreds`. So any external `helm template ... -f values-local-dev.yaml | kubectl apply` overwrites the controller Deployment's pod-spec env with ONLY the static-overlay values — silently nuking the dynamic creds. Result: every subsequent reconcile fails with: ERROR "No inference endpoint configured" and Deployments never get written. Pods never get created. The user then sees the operator's spawn dialog "succeed" (kars add times out on the wait loop, exits 0 from the previous-spinner branch — though now exits 1 from f39d425 — but in either case the CR is reconciled into nothing because the controller can't proceed). Fix: simplify the CRD-mismatch hint to recommend ONLY `kars dev --target local-k8s` (the source of truth for local dev). Drop the helm-template fallback that was actively harmful. Add an explicit DO NOT warning so the next person reading this hint doesn't try the obvious-but-wrong shortcut. Verified: npm run build ⇒ clean vitest run ⇒ 798 passed

This session shipped 12 OOTB blockers — every one diagnosable from cluster state + controller logs + chart source. Captures the design for an in-cluster SRE agent that auto-walks the same diagnostic ladder. Filed as docs/blueprints/07-kars-sre-proposal.md so it gets discoverable PR-review on its own merits without bloating this Hermes-runtime PR. Phased implementation: 1. MVP (kars-sre-mvp todo): 5 read-only tools. ~500 LOC, ~1 day. 2. Phase 2 (blocked on Phase 1): sre_apply_fix + AGT approval flow. 3. Phase 3 (blocked on Phase 2): continuous-watcher mode. Validation gate: the merged agent must autonomously diagnose + propose fixes for every one of the 12 OOTB blockers from THIS session, given only the cluster state that existed at the moment each was hit. That's a built-in regression corpus. No code shipped in this commit — design only. Implementation lands in a separate PR series.

…s-control section Two pieces: 1. fix(cli): rebuildDevImages always rebuilds controller + router User session hit a 30-minute debug loop because their kind cluster was running a `kars-controller:dev` image built BEFORE commit 493c118 (2026-06-04), which added the `dev_profile` check at controller/src/reconciler/mod.rs:1291: let pull_policy = if ctx.dev_profile || !image.ends_with(":latest") { "IfNotPresent" } else { "Always" }; Without dev_profile in the running controller, every `:latest` image got `imagePullPolicy: Always` — including the local kind-loaded kars-runtime-hermes — so the kubelet ALWAYS attempted a manifest fetch from karsacr.azurecr.io which kind cannot reach → ErrImagePull forever. Why this was hard to spot: rebuildDevImages skipped already-present images entirely. Running `kars dev --target local-k8s` after pulling new controller source did NOT rebuild the controller image. The user's check for KARS_DEV_PROFILE=true on the controller pod passed (env was correctly injected by the helm overlay), but the running controller process couldn't USE the env because the binary predates the env-reading code. Fix: introduce ALWAYS_REBUILD = {"controller", "inference-router"}. Both bottom out at a pre-staged Rust binary COPY (see staging in stage-rust-bin.ts), so docker layer cache makes the rebuild ~5-30s. That cost is well worth not silently running stale controllers. Sandbox + runtime images stay opt-in because their builds are minutes long. 2. docs(sre): expand kars-sre access-control design Added §6.1-6.6 to docs/blueprints/07-kars-sre-proposal.md: - 6.1 Tier 1 (MVP target): in-cluster ServiceAccount token on local-k8s — works on kind without any Entra/AKS dependency - 6.2 Tier 2 (Phase 2): AKS Workload Identity federation; byte- identical agent code, purely additive operator glue - 6.3 Complete ClusterRole `kars-sre-reader` spec — single authorization gate, every absent permission is deliberate - 6.4 Secrets handling — router-side .data stripping (~30 LOC in inference-router/src/proxy.rs) + RBAC defense in depth - 6.5 Phase-2 write actions via short-lived (5-min TTL), per-action ServiceAccount tokens minted on operator approval; standing blast radius stays read-only - 6.6 Egress already covered Verified: npm run build + typecheck ⇒ clean vitest run ⇒ 798 passed

…copy User on local-k8s after the image-loading saga finally got past ImagePullBackOff — pod proceeded to crash on agent-container startup with: cp: preserving permissions for '/sandbox/.hermes/plugins/kars/__init__.py': Operation not permitted cp: preserving permissions for '/sandbox/.hermes/plugins/kars/discover.py': Operation not permitted …(13 such lines) Root cause: sandbox-images/hermes/entrypoint.sh line 73 used `cp -a`, which preserves owner + mode + atime metadata. The staged source at /opt/kars-hermes-stage/plugins/kars/ was chowned root:root at image build time. The entrypoint runs as UID 1000 (sandbox user) on a `readOnlyRootFilesystem: true` pod, so preserving root ownership → EPERM from the kernel → `set -e` at the top of the script kills the container with the "Operation not permitted" spam as the entire visible output. Cryptic-as-hell symptom; trivial fix. Fix: `cp -r` instead of `cp -a`. Files end up owned by the copying user (UID 1000), which is what we want anyway because that's the UID hermes runs as. The source files have 0444 from the `chmod -R a+rX` in sandbox-images/hermes/Dockerfile, so they remain readable post-copy; no follow-up chmod needed. Prior art: the openclaw entrypoint at sandbox-images/openclaw/ entrypoint.sh:1273 uses `cp --no-preserve=mode` for the same reason (it copies the package.json staged at image build time into a runtime location). We're now consistent. No new tests — the failure mode is verifiable only against a real sandbox pod, and the OOTB-fresh-machine-gate todo will cover it once that CI lane lands.

… gpt-5.4) User picked `claude-opus-4.7` in the operator spawn dialog; the resulting Hermes pod was using gpt-5.4 instead. Root cause: sandbox-images/hermes/entrypoint.sh line 171 wrote echo " default: \"${AZURE_OPENAI_DEPLOYMENT:-gpt-5.4}\"" into the auto-generated config.yaml. AZURE_OPENAI_DEPLOYMENT is NOT injected into the agent container by the controller — only into the inference-router container's env (see controller/src/reconciler/mod.rs:1656 → router_env, vs the openclaw_env path at line 1335 which only sets the generic KARS_MODEL). So the hermes entrypoint fell through to the hardcoded gpt-5.4 default in EVERY case. The generic kars-runtime-contract env var is KARS_MODEL (controller/ src/reconciler/mod.rs:1335 — "Generic alias readable by any runtime — Hermes / OpenAIAgents / MAF / BYO all read KARS_MODEL"). That's the env Hermes should be honouring. Fix: prefer KARS_MODEL first, fall back to AZURE_OPENAI_DEPLOYMENT (for hand-crafted dev overlays that still set the legacy name), keep gpt-5.4 as last-resort default to keep the boot banner sensible when neither is set. ${KARS_MODEL:-${AZURE_OPENAI_DEPLOYMENT:-gpt-5.4}} User-visible effect: `kars operator → n → Hermes → model: claude-opus-4.7 → Launch` now actually runs claude-opus-4.7 in the pod, not gpt-5.4. Same for every other Copilot/Foundry model the user can pick.

…spawns) User session: a Hermes parent agent tried to spawn a sub-agent with the Hermes plugin's documented `role` arg. The router rejected with HTTP 422 Unprocessable Entity because: inference-router/src/spawn/mod.rs::SpawnRequest had #[serde( deny_unknown_fields)] and no `role` field. The Hermes plugin wrapper (runtimes/hermes/src/kars_runtime_hermes/plugin/spawn.py) sent `role` as a top-level body field and got 422. The Hermes plugin's docstring + the kars_spawn schema both expose `role` as a key arg: "Short persona/role description that siblings can find by role." It was always a missing-field in the router's request type, not bad client code. Fix: 1. Add `pub role: Option<String>` to SpawnRequest in inference-router/src/spawn/mod.rs:48 (the schema-deny-strict deserialize site). 2. Wire it through every existing SpawnRequest construction site (handoff/mod.rs × 6, spawn/dev_profile_test.rs × 1, spawn/mod.rs × 2 — the docker.rs snapshot path and the list-children reconstruction path). 3. In build_sub_agent_crd_with_labels, when role is non-empty, emit it as `kars.azure.com/role` label on the child CRD so: (a) sibling discovery via `kubectl get karssandbox -l kars.azure.com/role=auditor` works, (b) the parent's local roster can recover role on restart by reading children with kars.azure.com/parent + reading kars.azure.com/role, (c) the handoff/restore path snapshot can preserve role across re-spawn (the snapshot-reconstruction site now reads the label back). 4. Sanitize the label: K8s requires ≤63 chars + a specific character class. Replace disallowed chars with `-`, truncate. Don't fail the spawn over a space in the LLM's free-form persona text. User-visible effect: `Hermes parent → kars_spawn(name: zsolti, role: "data analyst")` now succeeds AND records the role for downstream discovery. Previously it failed with 422 and the Hermes plugin's wrapper retried without role (silently dropping the field's whole purpose). The agent that diagnosed this was, charmingly, correct. Verified: cargo build/clippy/fmt ⇒ clean cargo test --lib spawn ⇒ 16 passed ci/check-loc.sh ⇒ clean

The honest answer to "is OOTB really working?" is "you'd have to actually run the full flow on a fresh machine to know". This session shipped 16 OOTB blockers that no unit test caught — every single one a wire-format failure across module boundaries that only surfaces against a real cluster. This script makes that question answerable by anyone in one command: export KARS_OOTB_COPILOT_TOKEN="gho_..." bash scripts/smoke/fresh-machine-ootb.sh What it does: 1. Wipes carried state (kind cluster, AGT clone, ~/.kars, npm-link) 2. Fresh git clone into /tmp/kars-ootb-smoke 3. cd cli && npm ci && npm run build && npm link 4. kars dev --target local-k8s (non-interactive via seeded creds) 5. kars add one OpenClaw + one Hermes sandbox 6. Polls until both pods Running 2/2 (5-min timeout each) 7. Tears down (or --keep) Exits 0 iff every step succeeds. On failure prints the precise command that failed AND the pod's diagnostic output, so the regression is reproducible from the script output alone. Catches everything from the 2026-06-08 Hermes session: • AGT auto-clone missing → kars dev fails at chart-apply • Stale CRD → kars add 4xx • Stale controller / runtime images → ImagePullBackOff • KARS_DEV_PROFILE drift → same • `cp -a` perm error → CrashLoopBackOff • SpawnRequest missing role → kars add exit 1 • etc. Limitations today: • Needs a real Copilot OAuth token in env (can't run on a public GHA runner without secret wiring). The ootb-fresh-machine-gate todo tracks moving this to CI. • Single-cluster only; AKS / federation covered by separate interop scripts. • Doesn't validate model routing post-spawn yet (next iteration). README.md documents usage, what it catches, when to run, and how to extend for new wired runtimes. Not wired as a PR gate yet — runs manually. Filed `ootb-fresh-machine-gate` for the CI lane.

…tch deck Practitioner-grade visual language (mix of Patrick Collison / Stripe Press, Bret Victor, and Stripe-docs style — selected per slide intent): • Title + close : dark sandwich, 168pt mark, single tagline • Pillar overview : eyebrow + heading + lede paragraph + 4 named primitives • Sandbox : Victor-style — one named artefact, real CRD field labels (UID 1000, readOnlyRootFilesystem, etc.) • Sandbox · the gate : Stripe-docs — the 6 actual iptables rules in a code block + prose explanation on the right • Mesh : the real KNOCK frame JSON as the visual artefact (with v/type/from/to/id/ts/intent/establishment fields) • Governance · policy : the real InferencePolicy CR snippet as the artefact • Governance · stack : 4 layers vertically with name + body + source ref • Blueprints : 6 named shapes, real meta per shape (Kata + SEV-SNP, A2A bridge, private model + signed allowlist, etc.) • Multi-runtime : 8 wired runtimes named, with what each is in one line • Built on AGT : the 4 actual PRs/contributions named • What's next : the 4 actual shipping targets named • Try it : the actual `kars dev` command sequence as code block Every claim in the deck is ground-truthed against repo HEAD via deep-dive explorer runs (see docs/showcase/outline.md for full citations). Source file: /tmp/build-deck.js → pptxgenjs render. Diagram pair (Excalidraw): 6 .excalidraw files using single accent teal (#028090), Helvetica throughout, no inline arrows where layout implies the relationship. Kept minimal because the deck slides carry the detail; diagrams are hero shots, not info-dense. Source-of-truth doc: docs/showcase/outline.md — every claim with file:line citations against ground-truthed repo source. Updating outline.md → re-running /tmp/build-deck.js regenerates the deck.

… tiles cropped) Visual QA caught two layout bugs in the first render: - Slide 7 (MESH/KNOCK frame): JSON code block was 17 lines @ fontSize 12 in a 2.9" panel — closing braces overflowed below the gray panel. Fixed: shrunk to fontSize 11, panel taller (3.55"), raised to y:3.85 to use the available space without colliding with the lede. - Slide 11 (RUNTIMES): 4 tiles @ 3.85" wide + 0.15" gaps = 15.85" total, but the slide is only 13.3" wide — x0 went negative, cropping leftmost tiles. Fixed: tw=2.9" → 12.25" total, centred with 0.525" left/right margin. Re-rendered all 15 slides, visual QA pass clean across the board. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Vendoring /tmp/build-deck.js into scripts/showcase/ so the deck is reproducible from a clean clone — no more 'lost the script that made the slides' problem. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

+}
+
+// section divider (very minimal — used between major narrative arcs)
+function section(s, n, txt) {


+}
+
+// section divider (very minimal — used between major narrative arcs)
+function section(s, n, txt) {


…-shaped Major restructure per Pal's feedback (less competitive, more architecture): REMOVED: 'THE RACE' slide (competitor names) ADDED: 'THE OUTCOME' slide — 4 measurable outcomes: every call audited · sandbox in minutes · one policy plane · default-deny egress ADDED (architecture deep-dive, §3): §3 THE CORE — three boxes: Controller / 11 CRDs / Inference Router §3.1 ROUTER REQUEST FLOW — 6 stages: agent → iptables → router → policy → audit → upstream §3.2 ROUTER INTERNALS — 8 routes + 8 subsystems in two columns §3.3 CONTROLLER LOOP — KarsSandbox → 9 named Kubernetes primitives §3.4 STATUS & OBSERVABILITY — phase taxonomy + conditions YAML §3.5 CRD CATALOG — all 11 CRDs tabled with scope + reconciler ADDED (policy in practice, §4): §4 InferencePolicy (existing slide reworked) §4.1 ToolPolicy + EgressApproval (real CR snippets side by side) REFRAMED: §10 WHAT'S NEXT — outcome-shaped: 'capability → what becomes possible → proof' Visual QA: 4 layout bugs caught on first render (slides 9/11/13/20) and fixed; final QA pass clean on all 21 slides. Reproduces via: NODE_PATH=$(npm root -g) node scripts/showcase/build-deck.js Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

+}
+
+// right-column prose paired with codeBlock
+function rightProse(s, paragraphs, opts = {}) {


+}
+
+// right-column prose paired with codeBlock
+function rightProse(s, paragraphs, opts = {}) {


…gress slide Two new slides per Pal's feedback: NEW SLIDE 4 — HIGH-LEVEL ARCHITECTURE diagram (right after the 'WHAT KARS IS' dark statement, before drilling into 'THE CORE'). A single picture that shows the whole shape: dashed cluster boundary containing three boxes (kars CRDs · kars-controller · Sandbox pod), with the sandbox pod showing both inner containers (agent UID 1000 + inference-router UID 1001). Below the cluster: a row of external services chips (Azure OpenAI · Anthropic · OpenAI · Bedrock · MCP · A2A peers · AGT relay) with 'only path out' label connecting the architecture to the providers. NEW SLIDE 15 — NETWORK EGRESS · learn it, sign it, enforce it. The full story: - Top half: two side-by-side panels showing Learn mode (default — record every host into the next allowlist proposal) vs Strict mode (production — anything outside the signed allowlist gets 4xx; EgressApproval grants layer on top; fails closed) - Bottom half: 5-stage signed-OCI-allowlist pipeline: kars egress --sign → OCI artifact (ACR/ghcr) → cosign verify (Fulcio + SAN) → ConfigMap + digest → router · L7 hot-reload - Source refs: controller/src/policy_fetcher.rs · egress_allowlist_compile.rs · inference-router/src/egress_allowlist_loader.rs Section eyebrows renumbered to keep §3 sub-numbering consistent after insertion. Final QA: clean on all 23 slides. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

pallakatos requested review from johnsonshi and lachie83 as code owners June 4, 2026 05:26

Pal Lakatos and others added 2 commits June 4, 2026 07:56

pallakatos force-pushed the hermes/act1-docker-smoke-fixes branch from 2a712bc to 82b7fa1 Compare June 4, 2026 08:05

Pal Lakatos and others added 3 commits June 4, 2026 12:00

pallakatos force-pushed the hermes/act1-docker-smoke-fixes branch 2 times, most recently from 238453f to f7f16c8 Compare June 4, 2026 12:30

pallakatos force-pushed the hermes/act1-docker-smoke-fixes branch from f7f16c8 to f21048d Compare June 4, 2026 12:30

Pal Lakatos and others added 2 commits June 4, 2026 16:15

github-code-quality Bot found potential problems Jun 4, 2026

View reviewed changes

Comment thread runtimes/agt-mesh-python/src/kars_agt_mesh/relay_transport.py Fixed

Comment thread runtimes/agt-mesh-python/src/kars_agt_mesh/relay_transport.py Fixed

github-code-quality Bot found potential problems Jun 4, 2026

View reviewed changes

Comment thread runtimes/hermes/src/kars_runtime_hermes/plugin/mesh.py Fixed

Pal Allakatos and others added 3 commits June 5, 2026 01:52

chore(harness): remove stray .new file from mesh-roundtrip-hermes

42d00a6

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pal Allakatos <pallakatos@microsoft.com>

github-code-quality Bot found potential problems Jun 5, 2026

View reviewed changes

Pal Allakatos and others added 4 commits June 5, 2026 06:44

github-code-quality Bot found potential problems Jun 5, 2026

View reviewed changes

Comment thread runtimes/agt-mesh-python/tests/test_wire_format.py Fixed

Pal Lakatos-Toth and others added 24 commits June 8, 2026 09:24

docs(showcase): commit deck builder + README

ace04f4

Vendoring /tmp/build-deck.js into scripts/showcase/ so the deck is reproducible from a clean clone — no more 'lost the script that made the slides' problem. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-code-quality Bot found potential problems Jun 8, 2026

View reviewed changes

Comment thread scripts/showcase/build-deck.js

}

// section divider (very minimal — used between major narrative arcs)

function section(s, n, txt) {

github-advanced-security AI found potential problems Jun 8, 2026

View reviewed changes

Comment thread scripts/showcase/build-deck.js

}

// section divider (very minimal — used between major narrative arcs)

function section(s, n, txt) {

github-code-quality Bot found potential problems Jun 8, 2026

View reviewed changes

Comment thread scripts/showcase/build-deck.js

}

// right-column prose paired with codeBlock

function rightProse(s, paragraphs, opts = {}) {

github-advanced-security AI found potential problems Jun 8, 2026

View reviewed changes

Comment thread scripts/showcase/build-deck.js

}

// right-column prose paired with codeBlock

function rightProse(s, paragraphs, opts = {}) {

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hermes support#396

Hermes support#396
pallakatos wants to merge 62 commits into
mainfrom
hermes/act1-docker-smoke-fixes

pallakatos commented Jun 4, 2026

Uh oh!

github-actions Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pallakatos commented Jun 4, 2026

Summary

Verification

Security

Uh oh!

github-actions Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

License Issues

mesh-plugin/package.json

runtimes/agt-mesh-python/pyproject.toml

OpenSSF Scorecard

Scanned Files

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 4, 2026 •

edited

Loading