Skip to content

Add event recording and status conditions for worker deployments#203

Merged
carlydf merged 20 commits intomainfrom
add_events
Mar 7, 2026
Merged

Add event recording and status conditions for worker deployments#203
carlydf merged 20 commits intomainfrom
add_events

Conversation

@thearcticwatch
Copy link
Contributor

@thearcticwatch thearcticwatch commented Feb 21, 2026

What changed: Added Kubernetes events and status conditions

(TemporalConnectionHealthy, RolloutReady) to the worker controller
reconciliation loop.

##Why: Reconciliation failures were only visible in controller logs —
events and conditions let users diagnose issues directly via kubectl.

  1. Closes Add events to the TemporalWorkerDeployment CRD when there is a problem #28

  2. How was this tested:
    added unit tests and functional tests

  3. Any docs updates needed?
    N/A

  4. Is this risky? Explain

Making a change to the CRD (adding conditions) opens up the risk that users could upgrade the controller but fail to upgrade their CRD. In this case, it is ok if new features are silently ignored, but we don't want the controller to panic or fail to successfully do the actions that were available in the previous CRD version. I believe that this change is safe even if someone forgets to upgrade their CRD, because when this new controller runs against a v1.2.0 CRD:

  • No panic. The controller calls r.Status().Update(ctx, twd) with conditions populated in memory. The API server validates against the CRD schema and prunes unknown fields (standard behavior for structural schemas without x-kubernetes-preserve-unknown-fields). The status write succeeds with a 200 and the conditions are silently dropped before storage.
  • Kubernetes Events work fine. Events are written as separate events.k8s.io/v1 resources, completely independent of the TWD CRD schema. All r.Recorder.Eventf(...) calls will succeed normally.
  • Conditions simply don't persist. kubectl get twd foo -o yaml will show no conditions field. The controller sets them in memory on every reconcile, tries to write, and the API server drops them. Functionally the controller does the right thing, it just can't communicate the health status via conditions until the CRD is upgraded.

@thearcticwatch thearcticwatch requested review from a team and jlegrone as code owners February 21, 2026 00:39
@CLAassistant
Copy link

CLAassistant commented Feb 21, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also make fmt-imports will solve some of your lint errors

@thearcticwatch thearcticwatch enabled auto-merge (squash) February 21, 2026 14:14
Copy link
Collaborator

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good! just did initial review, we should still add a functional test once these comments are addressed.

I found https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#events and https://book.kubebuilder.io/reference/raising-events#creating-events helpful while reviewing.

Copy link
Collaborator

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really close from my perspective. will push a commit showing what I mean about the stricter string types for EventType and ConditionType.

carlydf and others added 3 commits March 3, 2026 15:33
"Registration" already has a meaning in Temporal versioning (a worker
polling for the first time creates a version record). "Promotion" better
describes setting a version as current or ramping, which moves it forward
in the rollout lifecycle.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Collaborator

@carlydf carlydf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but @Shivs11 if you could take a peek at my refactor of ClientPool which I did so we could emit a separate event type for invalid secret vs failed dial to Temporal server, that would be great!

@carlydf carlydf disabled auto-merge March 6, 2026 02:43
@carlydf carlydf enabled auto-merge (squash) March 6, 2026 02:43
@carlydf carlydf closed this Mar 6, 2026
auto-merge was automatically disabled March 6, 2026 02:44

Pull request was closed

@carlydf carlydf reopened this Mar 6, 2026
@carlydf carlydf merged commit 872bc38 into main Mar 7, 2026
14 checks passed
@carlydf carlydf deleted the add_events branch March 7, 2026 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add events to the TemporalWorkerDeployment CRD when there is a problem

4 participants