You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tests/docs/kms-bootstrap-onboard.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,8 +75,10 @@ Operational notes:
75
75
1. Prefer a **prebuilt KMS image**.
76
76
2.`Boot Progress: done` does **not** guarantee the onboard endpoint is ready.
77
77
3. The onboarding completion endpoint is **GET `/finish`**.
78
-
4. On teepod, onboard mode usually uses the `-8000` URL, while runtime TLS KMS RPC usually uses the `-8000s` URL.
78
+
4. On teepod with gateway, onboard mode usually uses the `-8000` URL, while runtime TLS KMS RPC usually uses the `-8000s` URL. **Port forwarding** (`--port tcp:0.0.0.0:<host-port>:8000`) is simpler than gateway for testing, because gateway requires the auth API to return a `gatewayAppId` at boot time.
79
79
5. If you use a very small custom webhook instead of the real auth service, `KMS.GetMeta` may fail because `auth_api.get_info()` expects extra chain / contract metadata fields. In that case, use `GetTempCaCert` as the runtime readiness probe.
80
+
6. dstack CVMs use QEMU user-mode networking — the host is reachable at **`10.0.2.2`** from inside the CVM. The `source_url` in `Onboard.Onboard` must use a CVM-reachable address (e.g., `https://10.0.2.2:<port>/prpc`), not `127.0.0.1`.
81
+
7.**Remote KMS attestation has an empty `osImageHash`.** When the receiver verifies the source KMS during onboard, the `osImageHash` is empty because `vm_config` is unavailable for remote attestation. Auth configs for receiver-side checks must include `"0x"` in the `osImages` array.
80
82
81
83
---
82
84
@@ -99,14 +101,16 @@ Use two independently controllable auth services:
99
101
100
102
They can be:
101
103
102
-
1. host-local if reachable by CVMs
104
+
1.**Preferred:**host-local, accessed from CVMs via `http://10.0.2.2:<port>` (QEMU host gateway)
103
105
2. public services
104
106
3. sidecars inside each KMS deployment
105
107
106
108
At minimum, both policies must allow the KMS instance they serve. During onboard, source-side policy must also allow the destination KMS caller.
107
109
108
110
For `auth-simple`, `kms.mrAggregated = []` is a deny-all policy for KMS. Add the current KMS MR values explicitly when switching a test from deny to allow.
109
111
112
+
Include `"0x"` in the `osImages` array for configs used in receiver-side onboard checks (see operational note 7 above).
Copy file name to clipboardExpand all lines: tests/docs/kms-self-authorization.md
+35-20Lines changed: 35 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,18 +11,22 @@ The goal is to validate the following behaviors without depending on `kms/e2e/`
11
11
12
12
This guide is written as a deployment-and-test runbook so an AI agent can follow it end-to-end.
13
13
14
-
> **Execution notes from a real run on teepod2 (2026-03-19):**
14
+
> **Execution notes from real runs on teepod2 (2026-03-19):**
15
15
>
16
16
> 1. Do **not** assume a host-local `auth-simple` instance is reachable from a CVM. In practice, the auth API must be:
17
17
> - publicly reachable by the CVM, or
18
18
> - deployed as a sidecar/internal service inside the same test environment.
19
-
> 2. For PR validation, prefer a **prebuilt KMS test image**. The run documented here used `cr.kvin.wang/dstack-kms:kms-auth-checks-157ad4ba`.
19
+
> - dstack CVMs use QEMU user-mode networking — the host is reachable at **`10.0.2.2`** from inside the CVM.
20
+
> 2. For PR validation, prefer a **prebuilt KMS test image**.
20
21
> 3.`Boot Progress: done` only means the VM guest boot finished. It does **not** guarantee the KMS onboard endpoint is already ready.
21
22
> 4. If you inject helper scripts through `docker-compose.yaml`, prefer inline `configs.content` over `configs.file` unless you have confirmed the extra files are copied into the deployment bundle.
22
23
> 5. The onboard completion endpoint is **GET `/finish`**, not POST.
23
24
> 6. Do **not** reuse a previously captured `mr_aggregated` across redeploys. Auth policies must be generated from the attestation of the **current** VM under test.
24
25
> 7. KMS now always requires quote/attestation. For local development without TDX hardware, use `sdk/simulator` instead of trying to run a no-attestation KMS flow.
25
26
> 8. For `auth-simple`, `kms.mrAggregated = []` is a deny-all policy for KMS. Use that as the baseline deny configuration, then add the measured KMS MR values for allow cases.
27
+
> 9.**Port forwarding is simpler than gateway for testing.** Using `--gateway` requires the auth API to return a valid `gatewayAppId`, which adds unnecessary complexity. Use `--port tcp:0.0.0.0:<host-port>:8000` instead.
28
+
> 10.**Remote KMS attestation has an empty `osImageHash`.** When the receiver verifies the source KMS during onboard, the `osImageHash` field in the attestation is empty (because `vm_config` is not available for the remote attestation). Auth configs for receiver-side checks must include `"0x"` in the `osImages` array to match this empty hash.
29
+
> 11. The `source_url` in the `Onboard.Onboard` request must use an address **reachable from inside the CVM** (e.g., `https://10.0.2.2:<port>/prpc`), not `127.0.0.1` which is the CVM's own loopback.
26
30
27
31
---
28
32
@@ -119,10 +123,10 @@ Strong recommendation for this manual test:
119
123
120
124
Using a prebuilt image significantly reduces ambiguity when a failure happens: you can focus on KMS authorization logic rather than image build or registry behavior.
121
125
122
-
Teepod/gateway URL convention observed during a real run:
126
+
If you use teepod gateway instead of port forwarding:
123
127
124
-
-**onboard mode:** use the `-8000` style URL
125
-
-**runtime TLS KMS RPC after bootstrap/onboard:** use the `-8000s` style URL
128
+
-**onboard mode:** use the `-8000` style URL (plain HTTP)
129
+
-**runtime TLS KMS RPC after bootstrap/onboard:** use the `-8000s` style URL (TLS passthrough)
126
130
127
131
Do not assume the same external URL works before and after onboarding is finished.
128
132
@@ -144,9 +148,9 @@ The original plan was to run two host-local `auth-simple` processes. In practice
144
148
145
149
Choose one of these options:
146
150
147
-
1.**Preferred:**deploy the auth API as a separate public service or CVM
148
-
2.**Also fine:**run the auth API as a sidecar in the same KMS test deployment
149
-
3.**Only if reachable:** run `auth-simple` on the operator host and point KMS at that reachable host/IP
151
+
1.**Preferred:**run `auth-simple` on the operator host and point KMS at `http://10.0.2.2:<port>` (QEMU host gateway). This is the simplest if the CVMs use QEMU user-mode networking.
152
+
2.**Also fine:**deploy the auth API as a separate public service or CVM
153
+
3.**Sidecar:** run the auth API as a sidecar in the same KMS test deployment
150
154
151
155
If you use the sidecar/public-service pattern, keep the same logical split:
152
156
@@ -224,12 +228,17 @@ Requirements for **both** VMs:
224
228
-`core.onboard.auto_bootstrap_domain = ""`
225
229
-`core.auth_api.type = "webhook"`
226
230
227
-
Point them at different auth services or sidecars:
231
+
Point them at different auth services. If using host-local `auth-simple` with QEMU user-mode networking:
228
232
229
-
-`kms-src` → `http://<host-reachable-ip>:3101`
230
-
-`kms-dst` → `http://<host-reachable-ip>:3102`
233
+
-`kms-src` → `http://10.0.2.2:3101`
234
+
-`kms-dst` → `http://10.0.2.2:3102`
231
235
232
-
If you use sidecars instead of host-local auth servers, replace those URLs with the sidecar/internal service addresses.
236
+
**Recommended deploy method:** use port forwarding (`--port`) instead of gateway. Gateway requires the auth API to return a `gatewayAppId` at boot, which makes testing harder. With port forwarding, the KMS onboard and runtime endpoints are directly accessible on the host:
- The onboard endpoint is plain onboarding mode, so use `Onboard.*`
248
-
- The runtime KMS endpoint is available only after bootstrap/onboard and `/finish`
259
+
- The onboard endpoint serves plain HTTP, so use `http://` for `KMS_*_ONBOARD`
260
+
- After bootstrap/onboard + `/finish`, the KMS restarts with TLS — use `https://` for `KMS_*_RUNTIME`
261
+
- The `source_url` in `Onboard.Onboard` must be reachable from inside the CVM (e.g., `https://10.0.2.2:9301/prpc`)
249
262
250
263
Wait until the onboard endpoint is actually ready before continuing. A simple probe loop is recommended:
251
264
@@ -300,12 +313,14 @@ All three values above are expected to be hex strings **without** the `0x` prefi
300
313
301
314
#### Deny-by-MR config
302
315
303
-
Use a wrong `mrAggregated` value while allowing the observed OS image:
316
+
Use a wrong `mrAggregated` value while allowing the observed OS image.
317
+
318
+
> **Important:** include `"0x"` in `osImages` to handle remote KMS attestation during onboard receiver-side checks, where `osImageHash` is empty because `vm_config` is unavailable for the remote attestation.
0 commit comments