Skip to content

Add per-key key-scope option for Windows machine vs user key store#1021

Merged
joshdrake merged 13 commits into
masterfrom
feat/add-tpm-key-scope
Jun 6, 2026
Merged

Add per-key key-scope option for Windows machine vs user key store#1021
joshdrake merged 13 commits into
masterfrom
feat/add-tpm-key-scope

Conversation

@joshdrake

Copy link
Copy Markdown
Contributor

Summary

  • Adds key-scope=machine|user URI parameter to kms/tpmkms, controlling whether the underlying private key lives in the local machine key store (NCRYPT_MACHINE_KEY_FLAG) or the current user's key store. Independent of store-location, so callers can pair a machine-store certificate with a user-owned key (e.g., a LocalSystem service that wants its cert visible machine-wide but its key only openable by SYSTEM) or vice versa.
  • Threads MachineKey per-key through tpm.CreateKeyConfig / tpm.AttestKeyConfig / tpm.CreateAKConfig and into the underlying NCrypt PCP create call. Per-key, not per-TPM-instance — a single tpm.TPM handle can mint or open keys in either scope.
  • t.open() reads the requested scope from context and applies it to attestConfig.MachineKey under the lock just before attest.OpenTPM. Safe because attestTPM is recreated on every open().
  • Persists MachineKey in storage.AK and storage.Key so reload after process restart uses the matching scope without the URI needing to carry it. JSON omitempty so existing on-disk stores remain readable and default to false (= existing user-scope behavior).
  • tpmkms.CreateKey preserves key-scope=machine in the returned URI so downstream CreateSigner/DeleteKey calls don't lose scope (closes a returned-URI bug we hit during testing).
  • AttestKey reads the AK's persisted scope and uses it for the attest session — attested keys inherit their AK's scope; conflicting AttestKeyConfig.MachineKey is rejected with a clear error.
  • DeleteKey/DeleteAK always clean up the file-store entry even if the in-TPM (NCrypt) deletion fails, preventing repeated retries from re-encountering the same failing entry. When the file-store cleanup succeeded but the in-TPM step did not, returns a typed *PartialDeleteError so callers can distinguish a partial delete from total failure.
  • Bumps github.com/smallstep/go-attestation to v0.4.9 for the OpenConfig.MachineKey field this change drives.

Defaults: when key-scope is unset, derive from store-location for back-compat (store-location=machinekey-scope=machine).

Background

Code review and a Windows test matrix (interactive admin + LocalSystem via psexec -s -i 0, both x86_64) showed that the agent's TPM-backed device-identity keys land under LocalSystem's user-profile PCPKSP scope rather than the machine scope, even when store-location=machine is set. The kms/capi machine-key fix from #802 doesn't apply because the agent uses kms/tpmkms (with enable-cng=true) — that path goes through tpm/internal/key/pcp_windows.go, where nCryptCreatePersistedKey was hardcoded to dwFlags=0. Fixing it requires (a) plumbing a MachineKey flag through the tpm package and (b) deciding how to express user intent at the URI layer.

This PR is an alternative shape to #1006. Differences:

  • Per-key, not per-TPM-instance. Herman's review feedback on Add store-location=machine URI option for Windows machine key store #1006 pushed back on WithMachineKey() being a per-TPM option (would affect all keys going through that handle); the discussion converged on per-key being the right model. This PR puts MachineKey on the per-key configs (CreateKeyConfig, AttestKeyConfig, CreateAKConfig) and uses a context value to thread it into t.open() for the operations that need it.
  • MachineKey lives on options, not attestConfig. Also from Herman's review.
  • key-scope is decoupled from store-location. Cert location and key ownership are orthogonal Windows concepts. There are real scenarios where a service wants its cert in LocalMachine\My for visibility but its key restricted to one identity (e.g., LocalSystem-only) — key-scope=user with store-location=machine expresses that. Default-derive from store-location for back-compat.
  • Typed *PartialDeleteError for DeleteKey/DeleteAK. Also from Herman's review.

Test plan

  • go build ./... clean on linux/amd64 and windows/amd64
  • go vet ./... clean (one pre-existing master warning in kms/capi/ncrypt_windows.go:569, unrelated to this change)
  • go test ./... green across all packages
  • Empirical Windows matrix (throwaway diagnostic harness, run interactive admin + psexec -s -i 0 LocalSystem):
    • key-scope=machine lands keys at C:\ProgramData\Microsoft\Crypto\PCPKSP\<sid-hash>\<file>.PCPKEY (interactive admin + LocalSystem).
    • key-scope=user lands keys at <userprofile>\AppData\Local\Microsoft\Crypto\PCPKSP\<sid-hash>\<file>.PCPKEY (LocalSystem profile when running as SYSTEM).
    • Decoupling proven: key-scope=machine + store-location=user puts cert in user store and key in machine PCPKSP; key-scope=user + store-location=machine puts cert in machine store and key in user PCPKSP.
    • Back-compat: key-scope unset + store-location=machine correctly resolves to machine scope.
    • CreateSigner via the URI returned by CreateKey round-trips successfully in every TPM scenario; sign+verify ok.

Out of scope

  • Agent-side change to set key-scope=machine on the device-identity TPM URI; lands separately in the agent repo.
  • Equivalent fix for kms/capi's returned-URI bug (different root cause, different file). Tracked separately.

🤖 Generated with Claude Code

Introduces a key-scope=machine|user URI parameter on tpmkms that
controls whether the underlying private key lives in the local machine
key store (NCRYPT_MACHINE_KEY_FLAG) or in the current user's key store.
The parameter is independent of store-location, which controls cert
location, so callers can request machine-store certs paired with
user-owned keys (or vice versa) for security-hardening scenarios such
as a service whose cert needs machine visibility but whose key should
only be openable by the service identity.

Architecture:
  - MachineKey is a per-key concern, not a per-TPM-instance one. Threaded
    through CreateKeyConfig, AttestKeyConfig, and CreateAKConfig, then
    via context into TPM.open() which sets attestConfig.MachineKey under
    the lock just before attest.OpenTPM. Persisted in storage so reload
    after restart uses the matching scope without the URI carrying it.
  - tpmkms.CreateKey now preserves key-scope=machine in the returned URI
    so downstream CreateSigner/DeleteKey calls don't lose scope.
  - AttestKey reads the AK's persisted scope and uses it for the attest
    session; the new key inherits the AK's scope. Conflicting explicit
    AttestKeyConfig.MachineKey is rejected with a clear error.
  - Defaults for back-compat: when key-scope is unset, derive from
    store-location (machine cert -> machine key) so existing URIs that
    happened to set store-location=machine continue to work.

Also fixes related delete-path issues:
  - DeleteKey/DeleteAK now always clean up the file-store entry even if
    the in-TPM (NCrypt) deletion fails, preventing repeated retries from
    re-encountering the same failing entry.
  - Returns a typed *PartialDeleteError when the in-TPM deletion failed
    but the file-store cleanup succeeded, so callers can distinguish
    that case from a complete failure.

Bumps github.com/smallstep/go-attestation to v0.4.9, which adds the
OpenConfig.MachineKey field this change drives.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@CLAassistant

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Previously the check only fired when the attested key was requested as
machine-scope while the AK was user-scope; the reverse case (AK
machine-scope, attested key requested user-scope) silently inherited
the AK's scope and produced a machine-scope key, which is surprising
behaviour for a caller who explicitly asked for user. Reject both
mismatch directions so the inheritance rule is loud, not silent.

Documented in the error message that attested keys inherit the AK's
scope and that callers must set AttestKeyConfig.MachineKey to match
the AK's scope explicitly.

Caught by the capi-diag attest-roundtrip matrix: the
ak-machine+key-user_xfail row was producing a successful machine-scope
attested key instead of erroring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread kms/tpmkms/uri.go Outdated
Comment on lines 39 to 52
// machineKey returns true if the resolved key scope is "machine".
// When keyScope is unset on the object, the value is derived from
// storeLocation: "machine" → true, anything else → false.
func (o objectProperties) machineKey() bool {
scope := o.keyScope
if scope == "" {
if o.storeLocation == "machine" {
scope = "machine"
} else {
scope = "user"
}
}
return scope == "machine"
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// machineKey returns true if the resolved key scope is "machine".
// When keyScope is unset on the object, the value is derived from
// storeLocation: "machine" → true, anything else → false.
func (o objectProperties) machineKey() bool {
scope := o.keyScope
if scope == "" {
if o.storeLocation == "machine" {
scope = "machine"
} else {
scope = "user"
}
}
return scope == "machine"
}
// isMachineKey returns true if the resolved key scope is "machine".
// When keyScope is unset on the object, the value is derived from
// storeLocation: "machine" → true, anything else → false.
func (o objectProperties) isMachineKey() bool {
return o.keyScope == "machine" ||
(o.keyScope == "" && o.storeLocation == "machine")
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread tpm/key.go Outdated
}
machineKey := ak.MachineKey

if err = t.open(withMachineKey(ctx, machineKey)); err != nil {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the order of t.open (which calls into t.store.load) and t.store.X could result in unexpected behavior depending on the Storage implementation. With the Dirstore implementation it won't be an issue in practice, but for other implementations it can be different.

Maybe t.store.Load can be extracted from t.open, and be called before any t.store operation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c3de912

I simply added an explicit t.store.Load for now to preserve the order. There will be duplicate Load calls, but I figured that's probably okay for now to hold off on refactoring open.

joshdrake and others added 4 commits May 12, 2026 20:58
Inlines the back-compat resolution into a single boolean expression so
the rule reads top-to-bottom: explicit key-scope=machine wins; absent
that, fall back to store-location=machine. The previous if/else chain
made the back-compat branch look more important than it is.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses Herman's review feedback on PR #1021: the original change
moved t.store.GetAK / t.store.GetKey ahead of t.open so the call could
discover the persisted MachineKey scope before opening attest. That
inverted the invariant that every t.store read happens after a
t.store.Load (which today only runs inside t.open). For the in-tree
Dirstore this didn't matter, but other Storage implementations may
return stale or empty data if read before Load.

AttestKey: open with the caller's requested scope first, then look up
the AK and validate. A scope mismatch produces the same error as before
but after a wasted attest open in the error path — strictly correct.

DeleteKey, DeleteAK, GetSigner: keep the read-before-open shape (they
need the persisted scope to set MachineKey on context, and the caller
doesn't supply it), but call t.store.Load explicitly first so the
invariant holds. t.open also Loads under the lock; both Loads are
idempotent for the storage implementations in this package.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CreateKey's ak=true branch called tpm.CreateAK, which uses the user-default
scope, and ignored the resolved machineKey (computed only after the branch
returned). So an AK requested with key-scope=machine was silently created in
user scope (MachineKey=false). The attested-key path *does* pass MachineKey,
so AttestKey's symmetric scope check then rejected every machine-scoped
attested key by that AK:

  AttestKeyConfig.MachineKey=true does not match AK "step-agent-ak" scope
  (MachineKey=false); attested keys inherit their AK's scope

This broke machine-scoped device attestation enrollment outright (seen in the
smallstep agent on Windows running as LocalSystem).

Create the AK with CreateAKWithConfig{MachineKey: machineKey} (machineKey is
now computed before the branch), and preserve key-scope=machine in the
returned AK URI so re-opens use the matching scope (mirroring the non-AK
path). CreateAK is exactly CreateAKWithConfig with an empty config, so the
user-scope path is unchanged.

Adds a tpmsimulator regression test reproducing the agent's enrollment path
(machine AK -> machine attested key); it fails without this fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@joshdrake joshdrake marked this pull request as ready for review June 3, 2026 17:39

@maraino maraino left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to use a release version of go-attestation once merged.

Comment thread tpm/ak.go Outdated
Comment thread tpm/tpm.go Outdated
Comment thread tpm/key.go Outdated
@joshdrake joshdrake requested a review from maraino June 3, 2026 21:37
@hslatman

hslatman commented Jun 3, 2026

Copy link
Copy Markdown
Member

Make sure to use a release version of go-attestation once merged.

So far we've always used a commit. We didn't release from surrogate, because we always wanted to upstream it. There were some erroneous tags while you were on vacation, but I deleted those again.

@maraino maraino left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add goTPMCall as an option.

Comment thread tpm/caps.go Outdated
Comment thread tpm/key.go Outdated
Comment thread tpm/tpm.go
@joshdrake joshdrake requested a review from maraino June 4, 2026 14:48

@maraino maraino left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, but I have a couple of questions, one is related to the use of errors.Is() and you can see a comment about that.

The second is if there might be any issues getting, listing or deleting AK or keys that MachineKey

Comment thread tpm/errors.go
// Cleaning up the file-store entry even on PCP failure prevents
// repeated retries from re-encountering the same failing entry,
// which is the failure mode that motivated this type.
type PartialDeleteError struct {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to use something like errors.Is(err, &PartialDeleteError{}), as it is it will return always false. However this will work and return true:

var partialDeleteErr *PartialDeleteError
errors.As(err, &partialDeleteErr)

If you want to make it work with errors.Is() you need to implement something like this:

func (e *PartialDeleteError) Is(target error) bool {
	_, ok := target.(*PartialDeleteError)
	return ok
}

@joshdrake joshdrake Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't really doing this for value checking, more for unwrapping with errors.As since the the struct carries the Underlying error and key Name. The idea being that DeleteKey callers can unwrap PartialDeleteError and then take additional action based on the underlying error (eg: delete the PCPKSP file handle, maybe).

@joshdrake

Copy link
Copy Markdown
Contributor Author

The second is if there might be any issues getting, listing or deleting AK or keys that MachineKey

by default only administrators or LocalSystem can perform key operations, but otherwise, no

@hslatman hslatman changed the title feat: add per-key key-scope option for Windows machine vs user key store Add per-key key-scope option for Windows machine vs user key store Jun 5, 2026
@joshdrake joshdrake merged commit 444dcf9 into master Jun 6, 2026
9 of 12 checks passed
@joshdrake joshdrake deleted the feat/add-tpm-key-scope branch June 6, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants