Skip to content

ssa: add DriftIgnoreRules to ApplyOptions for field-level ignore on apply#1157

Open
dipti-pai wants to merge 1 commit intofluxcd:mainfrom
dipti-pai:ssa-ignore-rules
Open

ssa: add DriftIgnoreRules to ApplyOptions for field-level ignore on apply#1157
dipti-pai wants to merge 1 commit intofluxcd:mainfrom
dipti-pai:ssa-ignore-rules

Conversation

@dipti-pai
Copy link
Copy Markdown
Member

@dipti-pai dipti-pai commented Mar 31, 2026

Changes include

  • Add []jsondiff.IgnoreRule to ssa ApplyOptions
  • Change has no effect in create code paths. In Apply and ApplyAll code paths that correspond to update, remove ignored fields matching the ignored rules from the object.

Includes tests for various scenarios for optional and mandatory fields, claiming shared/forceful ownership and orphaning owned fields without a manager.

  1. When drift ignore rules are specified for optional fields:
  • flux manager drops ownership of the ignored fields.
  • If another manager co-owned or claimed force ownership, the field values are set accordingly.
  • If no field manager is present, the field is dropped and garbage collected by Kubernetes.
  1. When drift ignore rules are specified for mandatory fields:
  • If another manager co-owned or claimed ownership of the fields forcefully, Flux drops ownership of the field. Once Flux drops ownership, the other manager cannot drop ownership of the mandatory field, they get an error.
  • If no other manager claims ownership of the fields, flux cannot drop ownership of the field and returns an error.

Associated kustomize controller draft PR testing the changes from controller perspective - fluxcd/kustomize-controller#1627

Fixes: #696

@dipti-pai dipti-pai requested a review from stefanprodan as a code owner March 31, 2026 20:39
@dipti-pai dipti-pai marked this pull request as draft March 31, 2026 20:42
Comment thread ssa/manager_apply_test.go Outdated
Comment thread ssa/manager_apply.go Outdated
Comment thread ssa/manager_apply.go Outdated
@dipti-pai dipti-pai marked this pull request as ready for review April 1, 2026 20:10
Copy link
Copy Markdown
Member

@matheuscscp matheuscscp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is looking great to me, but I'd like to better understand how applying the ignore rules after dry-run works. I'm not saying it's not correct at all, just saying that it's probably correct and I need to catch up and understand the whole flow. It would be great if we could talk about this in the next dev meeting 🙏

@dipti-pai
Copy link
Copy Markdown
Member Author

dipti-pai commented Apr 16, 2026

Thanks @matheuscscp for reviewing.

Adding a few notes below regarding dry run behavior with ssa and potential improvements in the current version of the code to make it easier to discuss during the dev meeting tomorrow.

Dry-run behavior with DriftIgnoreRules

dryRunApply always sends the complete desired object (including fields targeted by DriftIgnoreRules) to the API server with ForceOwnership. This is intentional - The API server validates the full payload — missing required fields like spec.selector would be rejected. Stripping ignored fields before dry-run would cause validation failures on every reconciliation.

The ignored fields are only stripped later — from the apply payload (not the dry-run), and only for existing objects (not creates).

Summarizing the full behavior for different scenarios and couple of improvements -

  1. With this feature, there is no change in behavior during the initial create of a resource.

  2. If no fields of the resource have drifted, hasDrifted returns false and returns UnchangedAction

  3. If driftIgnoreRules are specified and there is drift in one of the ignored fields where the other manager takes force ownership OR co-owns the field, Flux gives up the ownership correctly.

  4. If driftIgnoreRules are specified and there is drift in one of the ignored fields where there is no other manager, i.e for example if client-side apply was used to trigger the drift, Flux gives up the ownership dropping the field in apply. If the ignored field is a mandatory field, Kubernetes API server throws an error since when Flux gives up ownership, this is interpreted as a delete and mandatory field cannot be deleted. If the ignored field is not a mandatory field, the field is deleted OR default value is applied. This is by-design since the feature is intended to work only with fields managed via server-side apply. Our design : "Ignored field + drift = flux drops ownership".

  5. If driftIgnoreRules are specified and there is drift only in ignored field, on every reconcile, dryRunApply detects a drift in hasDrifted, the apply operation applies the ignore rules and ConfiguredAction is returned. Though from ssa perspective, the subsequent apply is a no-op. A possible improvement here could be to have a hasDriftedWithIgnoredFields that does the drift detection stripping the ignored fields from both the dryRunObject and the object to be applied. After the driftIgnoreRules are applied the first time, on subsequent reconciliation, hasDriftedWithIgnoredFields would detect no drift and return UnchangedAction. This optimization saves an unnecessary round trip to Kubernetes API server to do no-op ssa apply.

  6. Drift ignore rules are applied even if the drift ignored field has not changed -- In the current version, if a field not specified in the driftIgnoreRules drifts without any drift to fields specified in the driftIgnoreRules, the drift is corrected in apply. While correcting the drift, the driftIgnoreRules are applied as well. This can have unintended consequences when the user has specified a field in driftIgnoreRules but a field manager has not yet taken ownership of the field. A possible improvement could be if a drift ignored field has not drifted, do not apply the corresponding driftIgnoreRule.

My opinion is to take the improvements in 5. Improvement 6 is debatable - if a field is in driftIgnoreRule we should respect the user's intent and drop the ownership during apply is one way to think. Checking if the field has really drifted involves some more JSON pointer fetches and comparison of JSON pointers that complicate the logic. We could keep 6 as-is for now and iterate if users complain about it as well.

@stefanprodan
Copy link
Copy Markdown
Member

I think we should strive to avoid applying objects if only the ignored fields have drifted, and always return UnchangedAction without making extra API calls.

@matheuscscp
Copy link
Copy Markdown
Member

  1. If driftIgnoreRules are specified and there is drift only in ignored field, on every reconcile, dryRunApply detects a drift in hasDrifted, the apply operation applies the ignore rules and ConfiguredAction is returned. Though from ssa perspective, the subsequent apply is a no-op. A possible improvement here could be to have a hasDriftedWithIgnoredFields that does the drift detection stripping the ignored fields from both the dryRunObject and the object to be applied. After the driftIgnoreRules are applied the first time, on subsequent reconciliation, hasDriftedWithIgnoredFields would detect no drift and return UnchangedAction. This optimization saves an unnecessary round trip to Kubernetes API server to do no-op ssa apply.

If I understand this correctly, this is what @stefanprodan commented about above. I agree, we should make this improvement and do not apply if only ignored fields have drifted.

  1. Drift ignore rules are applied even if the drift ignored field has not changed -- In the current version, if a field not specified in the driftIgnoreRules drifts without any drift to fields specified in the driftIgnoreRules, the drift is corrected in apply. While correcting the drift, the driftIgnoreRules are applied as well. This can have unintended consequences when the user has specified a field in driftIgnoreRules but a field manager has not yet taken ownership of the field. A possible improvement could be if a drift ignored field has not drifted, do not apply the corresponding driftIgnoreRule.

This improvement also seems quite important.

@dipti-pai dipti-pai requested a review from a team as a code owner April 20, 2026 19:23
@dipti-pai
Copy link
Copy Markdown
Member Author

@stefanprodan @matheuscscp

With the latest iteration, the improvements 5. and 6. in this comment are implemented. Summarizing the new behavior

  1. Drift detection now ignores specified fields

Before comparing the existing object with the dry-run result, hasDriftedWithIgnore() strips the ignored paths from both objects. If the only differences are in ignored fields, the object is treated as Unchanged — no apply happens. This prevents unnecessary apply when only externally-managed fields (e.g. HPA replicas, VPA resource limits) differ and incorrect Configured action from being returned when ignored rules drift.

  1. Selective field stripping on apply

When an apply does happen (because non-ignored fields changed), we compute which ignored paths have actually drifted between live object and dry-run. Only those drifted ignored paths are removed from the apply payload via JSON patch. This means --

  • Drifted ignored fields → stripped from payload → external controller's values are preserved
  • Non-drifted ignored fields → kept in payload → Flux retains field ownership

Because of these changes, a new scenario arises which I want to call out - Flux does not update an ignored field even when its own source changes. For example,

.spec.replicas = 2 (from Flux's last apply)

existing.spec.replicas = 5 (from HPA's apply)
dryRunObject.spec.replicas = 4 (Flux's new desired from new source change)

With the current changes, if .spec.replicas is ignored, it means "this field is fully delegated to another controller" — Flux doesn't touch it or try to reclaim ownership regardless of source changes.

Comment thread ssa/manager_apply.go Outdated
Comment thread ssa/manager_apply.go
@matheuscscp
Copy link
Copy Markdown
Member

With the current changes, if .spec.replicas is ignored, it means "this field is fully delegated to another controller" — Flux doesn't touch it or try to reclaim ownership regardless of source changes.

Small but important correction here (if I understood the current iteration correctly; please correct me if I'm wrong, @dipti-pai): when values match, Flux does reclaim ownership even if another manager already owns the field. Stripping only on real drift keeps things simple (no need to parse managedFields.fieldsV1 to detect ownership), at the cost of occasional ownership ping-pong when the external controller's value converges with Flux's desired. Worth calling out in the DriftIgnoreRules docs.

@dipti-pai
Copy link
Copy Markdown
Member Author

With the current changes, if .spec.replicas is ignored, it means "this field is fully delegated to another controller" — Flux doesn't touch it or try to reclaim ownership regardless of source changes.

Small but important correction here (if I understood the current iteration correctly; please correct me if I'm wrong, @dipti-pai): when values match, Flux does reclaim ownership even if another manager already owns the field. Stripping only on real drift keeps things simple (no need to parse managedFields.fieldsV1 to detect ownership), at the cost of occasional ownership ping-pong when the external controller's value converges with Flux's desired. Worth calling out in the DriftIgnoreRules docs.

@matheuscscp, yes this is true because we only strip the fields when they drift. So, if they converge, Flux reclaims ownership. Note this happens only when Apply is triggered due to another non-ignored field drifting. If no other field has drifted, then on every reconcile, UnchangedAction is returned (as per the latest iteration code changes).

@matheuscscp
Copy link
Copy Markdown
Member

With the current changes, if .spec.replicas is ignored, it means "this field is fully delegated to another controller" — Flux doesn't touch it or try to reclaim ownership regardless of source changes.

Small but important correction here (if I understood the current iteration correctly; please correct me if I'm wrong, @dipti-pai): when values match, Flux does reclaim ownership even if another manager already owns the field. Stripping only on real drift keeps things simple (no need to parse managedFields.fieldsV1 to detect ownership), at the cost of occasional ownership ping-pong when the external controller's value converges with Flux's desired. Worth calling out in the DriftIgnoreRules docs.

@matheuscscp, yes this is true because we only strip the fields when they drift. So, if they converge, Flux reclaims ownership. Note this happens only when Apply is triggered due to another non-ignored field drifting. If no other field has drifted, then on every reconcile, UnchangedAction is returned (as per the latest iteration code changes).

LGTM 👍

Are there any outstanding items left for us to address from your initial list?

@dipti-pai
Copy link
Copy Markdown
Member Author

Are there any outstanding items left for us to address from your initial list?

The things we discussed last dev meeting are addressed in the latest iteration. Made the changes for the AI-generated review comments, they look solid and rebased on main. Thanks so much for reviewing!

@stefanprodan stefanprodan added enhancement New feature or request area/server-side-apply SSA related issues and pull requests labels May 1, 2026
Copy link
Copy Markdown
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @dipti-pai 🥇

@stefanprodan
Copy link
Copy Markdown
Member

@dipti-pai please create an issue in flux2 for taking into account the ignore rules in flux diff kustomization.

@matheuscscp
Copy link
Copy Markdown
Member

matheuscscp commented May 1, 2026

@dipti-pai Found something: I was looking at how we deal with rules containing an empty selector i.e. the "target-all" rules, and I noticed that we silently use the last one if there are multiple. This is only a thing for target-all rules because the map is by pointer, and the pointers are always unique except for nil.

But I also noticed that this is already what we do in helm-controller, so it's too late to change. In that case, I noticed we are duplicating this logic in pkg/ssa and pkg/ssa/jsondiff. But this we can fix (which helps us in the future if we decide to handle multiple target-all rules more explicitly):

diff --git a/ssa/jsondiff/unstructured.go b/ssa/jsondiff/unstructured.go
index 400c15e..7d7f458 100644
--- a/ssa/jsondiff/unstructured.go
+++ b/ssa/jsondiff/unstructured.go
@@ -44,6 +44,23 @@ type IgnoreRule struct {
 	Selector *Selector
 }
 
+// CompiledIgnoreRules is a set of IgnoreRule with compiled selectors.
+type CompiledIgnoreRules map[*SelectorRegex][]string
+
+// CompileIgnoreRules compiles the selectors in the given IgnoreRule slice
+// and returns a CompiledIgnoreRules.
+func CompileIgnoreRules(rules []IgnoreRule) (CompiledIgnoreRules, error) {
+	compiled := make(CompiledIgnoreRules, len(rules))
+	for _, rule := range rules {
+		sr, err := NewSelectorRegex(rule.Selector)
+		if err != nil {
+			return nil, fmt.Errorf("failed to create ignore rule selector: %w", err)
+		}
+		compiled[sr] = rule.Paths
+	}
+	return compiled, nil
+}
+
 // UnstructuredList runs a dry-run patch for a list of Kubernetes resources
 // against a Kubernetes cluster and compares the result against the original
 // objects. It returns a DiffSet, which contains differences between the
@@ -59,13 +76,9 @@ func UnstructuredList(ctx context.Context, c client.Client, objs []*unstructured
 	o := &ListOptions{}
 	o.ApplyOptions(opts)
 
-	var sm = make(map[*SelectorRegex][]string, len(o.IgnoreRules))
-	for _, ips := range o.IgnoreRules {
-		sr, err := NewSelectorRegex(ips.Selector)
-		if err != nil {
-			return nil, fmt.Errorf("failed to create ignore rule selector: %w", err)
-		}
-		sm[sr] = ips.Paths
+	sm, err := CompileIgnoreRules(o.IgnoreRules)
+	if err != nil {
+		return nil, err
 	}
 
 	var resOpts []ResourceOption
diff --git a/ssa/manager_apply.go b/ssa/manager_apply.go
index 88bac2e..a6e111d 100644
--- a/ssa/manager_apply.go
+++ b/ssa/manager_apply.go
@@ -105,10 +105,6 @@ type ApplyCleanupOptions struct {
 	Exclusions map[string]string `json:"exclusions"`
 }
 
-// compiledIgnoreRules is a map of pre-compiled selectors to their associated
-// JSON pointer paths, used to avoid recompiling selectors for each object.
-type compiledIgnoreRules map[*jsondiff.SelectorRegex][]string
-
 // DefaultApplyOptions returns the default apply options where force apply is disabled.
 func DefaultApplyOptions() ApplyOptions {
 	return ApplyOptions{
@@ -161,9 +157,9 @@ func (m *ResourceManager) Apply(ctx context.Context, object *unstructured.Unstru
 	patched = patched || patchedCleanupMetadata
 
 	// Compile ignore rules once for both drift detection and conditional field stripping.
-	var compiled compiledIgnoreRules
+	var compiled jsondiff.CompiledIgnoreRules
 	if existingObject.GetResourceVersion() != "" && len(opts.DriftIgnoreRules) > 0 {
-		compiled, err = compileIgnoreRules(opts.DriftIgnoreRules)
+		compiled, err = jsondiff.CompileIgnoreRules(opts.DriftIgnoreRules)
 		if err != nil {
 			return nil, err
 		}
@@ -219,10 +215,10 @@ func (m *ResourceManager) ApplyAll(ctx context.Context, objects []*unstructured.
 	driftedIgnorePaths := make([]jsondiff.IgnorePaths, len(objects))
 
 	// Compile ignore rules once for drift detection and conditional field stripping.
-	var compiled compiledIgnoreRules
+	var compiled jsondiff.CompiledIgnoreRules
 	if len(opts.DriftIgnoreRules) > 0 {
 		var err error
-		compiled, err = compileIgnoreRules(opts.DriftIgnoreRules)
+		compiled, err = jsondiff.CompileIgnoreRules(opts.DriftIgnoreRules)
 		if err != nil {
 			return nil, err
 		}
@@ -565,24 +561,10 @@ func (m *ResourceManager) shouldSkipApply(desiredObject *unstructured.Unstructur
 	return false, nil
 }
 
-// compileIgnoreRules compiles the selectors from the given ignore rules into
-// regular expressions. The compiled rules can be reused across multiple objects.
-func compileIgnoreRules(rules []jsondiff.IgnoreRule) (compiledIgnoreRules, error) {
-	sm := make(compiledIgnoreRules, len(rules))
-	for _, rule := range rules {
-		sr, err := jsondiff.NewSelectorRegex(rule.Selector)
-		if err != nil {
-			return nil, fmt.Errorf("failed to create ignore rule selector: %w", err)
-		}
-		sm[sr] = rule.Paths
-	}
-	return sm, nil
-}
-
 // removeIgnoredFields removes the fields matched by the given pre-compiled
 // ignore rules from obj. Selectors are evaluated against matchObj so that
 // existing and dry-run copies are stripped based on the same decision.
-func removeIgnoredFields(matchObj, obj *unstructured.Unstructured, rules compiledIgnoreRules) error {
+func removeIgnoredFields(matchObj, obj *unstructured.Unstructured, rules jsondiff.CompiledIgnoreRules) error {
 	var ignorePaths jsondiff.IgnorePaths
 	for sr, paths := range rules {
 		if sr.MatchUnstructured(matchObj) {
@@ -620,7 +602,7 @@ func lookupJSONPointer(obj *unstructured.Unstructured, pointer string) (any, boo
 // between existingObject and dryRunObject.
 func computeDriftedPaths(
 	existingObject, dryRunObject *unstructured.Unstructured,
-	rules compiledIgnoreRules,
+	rules jsondiff.CompiledIgnoreRules,
 ) jsondiff.IgnorePaths {
 	var drifted jsondiff.IgnorePaths
 	for sr, paths := range rules {
@@ -644,7 +626,7 @@ func computeDriftedPaths(
 // on the same decision, matching computeDriftedPaths.
 func (m *ResourceManager) hasDriftedWithIgnore(
 	existingObject, dryRunObject *unstructured.Unstructured,
-	compiled compiledIgnoreRules,
+	compiled jsondiff.CompiledIgnoreRules,
 ) (bool, error) {
 	if compiled == nil {
 		return m.hasDrifted(existingObject, dryRunObject), nil

@matheuscscp
Copy link
Copy Markdown
Member

  1. If driftIgnoreRules are specified and there is drift in one of the ignored fields where there is no other manager, i.e for example if client-side apply was used to trigger the drift, Flux gives up the ownership dropping the field in apply. If the ignored field is a mandatory field, Kubernetes API server throws an error since when Flux gives up ownership, this is interpreted as a delete and mandatory field cannot be deleted. If the ignored field is not a mandatory field, the field is deleted OR default value is applied. This is by-design since the feature is intended to work only with fields managed via server-side apply. Our design : "Ignored field + drift = flux drops ownership".

What if I use kubectl with server-side apply and the kustomize-controller field manager to force a drift in an ignored field? I wrote a test case for this and broke the scenario in 3:

  • no drift on non-ignored fields
  • drift on non-ignored fields + the drifted ignored field is required
  • drift on non-ignored fields + the drifted ignored field has a default
Click to see tests
// TestApply_DriftIgnoreRules_SameFieldManagerDrift covers the scenario
// where a user runs `kubectl apply --server-side
// --field-manager=<flux-field-manager> --force-conflicts` to drift an
// ignored field. Because the same field manager is reused, Flux remains
// the sole owner — there is no "other manager" claiming the field.
//
// Existing ignore-rule tests only cover the case where a different manager
// owns the drifted field (so releasing ownership hands it over). The sole-
// owner variant is uncovered, and the three subtests below capture the
// expected behavior:
//
//   - When the only drift is on the ignored field, Apply is a no-op
//     (UnchangedAction) and the drifted value stays in place.
//   - When a non-ignored field also drifts and the ignored field is
//     required, Flux must correct the drift to its desired value rather
//     than strip the field — stripping makes the API server reject the
//     apply and the reconciliation gets stuck.
//   - When a non-ignored field also drifts and the ignored field has an
//     API-server default, Flux must correct the drift to its desired value
//     rather than strip the field — stripping lets the API server fill the
//     default, silently "correcting" the drift to the wrong value.
func TestApply_DriftIgnoreRules_SameFieldManagerDrift(t *testing.T) {
	timeout := 30 * time.Second
	ctx, cancel := context.WithTimeout(context.Background(), timeout)
	defer cancel()

	id := generateName("drift-same-fm")
	objects, err := readManifest("testdata/test2.yaml", id)
	if err != nil {
		t.Fatal(err)
	}

	manager.SetOwnerLabels(objects, "app1", "default")

	if err := normalize.UnstructuredList(objects); err != nil {
		t.Fatal(err)
	}

	_, deployObject := getFirstObject(objects, "Deployment", id)

	// Ignored path covers a required + mutable field of Deployment.
	opts := DefaultApplyOptions()
	opts.DriftIgnoreRules = []jsondiff.IgnoreRule{
		{
			Paths: []string{"/spec/template/spec/containers"},
			Selector: &jsondiff.Selector{
				Kind: "Deployment",
			},
		},
	}

	// Initial create.
	if _, err := manager.ApplyAllStaged(ctx, objects, opts); err != nil {
		t.Fatal(err)
	}

	// Drift the ignored field via SSA reusing Flux's field manager. Reuse
	// the manifest copy and mutate only the image so all other container
	// subfields stay claimed by the same manager (avoiding incidental drift
	// from released subfields). Equivalent to:
	//   kubectl apply --server-side \
	//     --field-manager=resource-manager --force-conflicts -f drifted.yaml
	drifted := deployObject.DeepCopy()
	driftedContainers, _, _ := unstructured.NestedSlice(drifted.Object, "spec", "template", "spec", "containers")
	driftedContainers[0].(map[string]interface{})["image"] = "ghcr.io/stefanprodan/podinfo:6.1.0"
	if err := unstructured.SetNestedSlice(drifted.Object, driftedContainers, "spec", "template", "spec", "containers"); err != nil {
		t.Fatal(err)
	}
	if err := manager.client.Patch(ctx, drifted, client.Apply,
		client.FieldOwner(manager.owner.Field), client.ForceOwnership); err != nil {
		t.Fatalf("same-field-manager SSA failed: %v", err)
	}

	t.Run("no drift on non-ignored fields", func(t *testing.T) {
		entry, err := manager.Apply(ctx, deployObject, opts)
		if err != nil {
			t.Fatalf("expected no-op apply, got error: %v", err)
		}
		if entry.Action != UnchangedAction {
			t.Errorf("expected UnchangedAction (no-op), got %s", entry.Action)
		}

		// Drifted value is preserved in-cluster.
		existing := deployObject.DeepCopy()
		if err := manager.client.Get(ctx, client.ObjectKeyFromObject(existing), existing); err != nil {
			t.Fatal(err)
		}
		containers, _, _ := unstructured.NestedSlice(existing.Object, "spec", "template", "spec", "containers")
		if len(containers) == 0 {
			t.Fatal("expected containers to still exist")
		}
		if got := containers[0].(map[string]interface{})["image"]; got != "ghcr.io/stefanprodan/podinfo:6.1.0" {
			t.Errorf("expected drifted image to be preserved, got %v", got)
		}
	})

	t.Run("drift on non-ignored fields + the drifted ignored field is required", func(t *testing.T) {
		// Expected behavior: when a non-ignored field also drifts and Flux
		// is the sole owner of the ignored field, the apply should keep the
		// ignored field in the payload (i.e. correct the drift to Flux's
		// desired value) instead of stripping ownership. Stripping leaves
		// the kustomization stuck on a required field, which is worse than
		// reverting the manual change.
		if err := unstructured.SetNestedField(deployObject.Object, int64(9), "spec", "minReadySeconds"); err != nil {
			t.Fatal(err)
		}

		entry, err := manager.Apply(ctx, deployObject, opts)
		if err != nil {
			t.Fatalf("expected apply to succeed by correcting the drift, got error: %v", err)
		}
		if entry.Action != ConfiguredAction {
			t.Errorf("expected ConfiguredAction, got %s", entry.Action)
		}

		existing := deployObject.DeepCopy()
		if err := manager.client.Get(ctx, client.ObjectKeyFromObject(existing), existing); err != nil {
			t.Fatal(err)
		}
		containers, _, _ := unstructured.NestedSlice(existing.Object, "spec", "template", "spec", "containers")
		if len(containers) == 0 {
			t.Fatal("expected containers to still exist")
		}
		if got := containers[0].(map[string]interface{})["image"]; got != "ghcr.io/stefanprodan/podinfo:6.0.0" {
			t.Errorf("expected drift to be corrected to Flux's desired image 6.0.0, got %v", got)
		}
	})

	t.Run("drift on non-ignored fields + the drifted ignored field has a default", func(t *testing.T) {
		// Expected behavior: ignored path /spec/replicas has an API-server
		// default of 1. With same-field-manager drift on replicas, Flux
		// remains sole owner. When a non-ignored field also drifts, the
		// apply should keep replicas in the payload and restore Flux's
		// desired value (2). Stripping replicas would let the API server
		// fill the default (1) — silently "correcting" the drift to the
		// wrong value, with no error and no audit trail.
		id := generateName("drift-default")
		objects, err := readManifest("testdata/test2.yaml", id)
		if err != nil {
			t.Fatal(err)
		}
		manager.SetOwnerLabels(objects, "app1", "default")
		if err := normalize.UnstructuredList(objects); err != nil {
			t.Fatal(err)
		}
		_, deploy := getFirstObject(objects, "Deployment", id)

		replicasOpts := DefaultApplyOptions()
		replicasOpts.DriftIgnoreRules = []jsondiff.IgnoreRule{{
			Paths:    []string{"/spec/replicas"},
			Selector: &jsondiff.Selector{Kind: "Deployment"},
		}}

		// Create with replicas explicitly set so Flux owns it.
		if err := unstructured.SetNestedField(deploy.Object, int64(2), "spec", "replicas"); err != nil {
			t.Fatal(err)
		}
		if _, err := manager.ApplyAllStaged(ctx, objects, replicasOpts); err != nil {
			t.Fatal(err)
		}

		// Drift replicas via same field manager — Flux stays the sole owner.
		drifted := deploy.DeepCopy()
		if err := unstructured.SetNestedField(drifted.Object, int64(5), "spec", "replicas"); err != nil {
			t.Fatal(err)
		}
		if err := manager.client.Patch(ctx, drifted, client.Apply,
			client.FieldOwner(manager.owner.Field), client.ForceOwnership); err != nil {
			t.Fatalf("same-field-manager SSA failed: %v", err)
		}

		// Force a non-ignored change to trigger an apply.
		if err := unstructured.SetNestedField(deploy.Object, int64(9), "spec", "minReadySeconds"); err != nil {
			t.Fatal(err)
		}

		entry, err := manager.Apply(ctx, deploy, replicasOpts)
		if err != nil {
			t.Fatalf("expected apply to succeed by correcting the drift, got error: %v", err)
		}
		if entry.Action != ConfiguredAction {
			t.Errorf("expected ConfiguredAction, got %s", entry.Action)
		}

		existing := deploy.DeepCopy()
		if err := manager.client.Get(ctx, client.ObjectKeyFromObject(existing), existing); err != nil {
			t.Fatal(err)
		}
		replicas, found, _ := unstructured.NestedInt64(existing.Object, "spec", "replicas")
		if !found {
			t.Fatal("expected spec.replicas to exist")
		}
		if replicas != 2 {
			t.Errorf("expected spec.replicas to be corrected to Flux's desired value 2, got %d", replicas)
		}
	})
}

The result is:

  • no drift on non-ignored fields: no-op (expected!)
  • drift on non-ignored fields + the drifted ignored field is required: SSA error
  • drift on non-ignored fields + the drifted ignored field has a default: applies the default

I'm wondering if we should leave it like that or if we should correct the drift on the ignored fields in this case.

@dipti-pai
Copy link
Copy Markdown
Member Author

dipti-pai commented May 1, 2026

  1. If driftIgnoreRules are specified and there is drift in one of the ignored fields where there is no other manager, i.e for example if client-side apply was used to trigger the drift, Flux gives up the ownership dropping the field in apply. If the ignored field is a mandatory field, Kubernetes API server throws an error since when Flux gives up ownership, this is interpreted as a delete and mandatory field cannot be deleted. If the ignored field is not a mandatory field, the field is deleted OR default value is applied. This is by-design since the feature is intended to work only with fields managed via server-side apply. Our design : "Ignored field + drift = flux drops ownership".

What if I use kubectl with server-side apply and the kustomize-controller field manager to force a drift in an ignored field? I wrote a test case for this and broke the scenario in 3:

  • no drift on non-ignored fields
  • drift on non-ignored fields + the drifted ignored field is required
  • drift on non-ignored fields + the drifted ignored field has a default

Click to see tests

The result is:

  • no drift on non-ignored fields: no-op (expected!)
  • drift on non-ignored fields + the drifted ignored field is required: SSA error
  • drift on non-ignored fields + the drifted ignored field has a default: applies the default

I'm wondering if we should leave it like that or if we should correct the drift on the ignored fields in this case.

Interesting test case :) One could argue that this is a misconfiguration -- two distinct actors should never share the same field-manager name, it defeats the purpose of SSA ownership tracking.

Today, the drift logic doesn't check who owns the field, if there is drift, it strips Flux's ownership preserving the existing value. We could fix the above use case by checking if Flux is the sole manager and not dropping ownership if so.

To address the case described in 4 (client side edit/edits without a distinct field manager). we have two options -

Option A: Always correct sole-owner drift

  • Remove ignored fields from the hasDriftedWithIgnore comparison only when another manager owns them
  • If Flux is sole owner and the drift is visible → trigger an apply → corrects drift immediately
  • Pro: this behavior is consistent - client-side edit to an ignored field is always reverted on next reconcile
  • Asserts the design that ignore rules are designed for SSA controllers, not client-side edits.

Option B: Current behavior — accept the inconsistency and document it

  • Ignore rules only work reliably with distinct SSA field managers (VPA, HPA, etc.)
  • Client-side edits to ignored fields are "best effort" — they may or may not persist.
  • Document this as a known limitation.

Option A seems cleaner, consistent behavior, the feature is intended to work with server side apply. Ignore rules cannot be used to manually override fields via kubectl edit. Thoughts ?

@dipti-pai
Copy link
Copy Markdown
Member Author

@dipti-pai Found something: I was looking at how we deal with rules containing an empty selector i.e. the "target-all" rules, and I noticed that we silently use the last one if there are multiple. This is only a thing for target-all rules because the map is by pointer, and the pointers are always unique except for nil.

But I also noticed that this is already what we do in helm-controller, so it's too late to change. In that case, I noticed we are duplicating this logic in pkg/ssa and pkg/ssa/jsondiff. But this we can fix (which helps us in the future if we decide to handle multiple target-all rules more explicitly):

diff --git a/ssa/jsondiff/unstructured.go b/ssa/jsondiff/unstructured.go
index 400c15e..7d7f458 100644
--- a/ssa/jsondiff/unstructured.go
+++ b/ssa/jsondiff/unstructured.go
@@ -44,6 +44,23 @@ type IgnoreRule struct {
 	Selector *Selector
 }
 
+// CompiledIgnoreRules is a set of IgnoreRule with compiled selectors.
+type CompiledIgnoreRules map[*SelectorRegex][]string
+
+// CompileIgnoreRules compiles the selectors in the given IgnoreRule slice
+// and returns a CompiledIgnoreRules.
+func CompileIgnoreRules(rules []IgnoreRule) (CompiledIgnoreRules, error) {
+	compiled := make(CompiledIgnoreRules, len(rules))
+	for _, rule := range rules {
+		sr, err := NewSelectorRegex(rule.Selector)
+		if err != nil {
+			return nil, fmt.Errorf("failed to create ignore rule selector: %w", err)
+		}
+		compiled[sr] = rule.Paths
+	}
+	return compiled, nil
+}
+
 // UnstructuredList runs a dry-run patch for a list of Kubernetes resources
 // against a Kubernetes cluster and compares the result against the original
 // objects. It returns a DiffSet, which contains differences between the
@@ -59,13 +76,9 @@ func UnstructuredList(ctx context.Context, c client.Client, objs []*unstructured
 	o := &ListOptions{}
 	o.ApplyOptions(opts)
 
-	var sm = make(map[*SelectorRegex][]string, len(o.IgnoreRules))
-	for _, ips := range o.IgnoreRules {
-		sr, err := NewSelectorRegex(ips.Selector)
-		if err != nil {
-			return nil, fmt.Errorf("failed to create ignore rule selector: %w", err)
-		}
-		sm[sr] = ips.Paths
+	sm, err := CompileIgnoreRules(o.IgnoreRules)
+	if err != nil {
+		return nil, err
 	}
 
 	var resOpts []ResourceOption
diff --git a/ssa/manager_apply.go b/ssa/manager_apply.go
index 88bac2e..a6e111d 100644
--- a/ssa/manager_apply.go
+++ b/ssa/manager_apply.go
@@ -105,10 +105,6 @@ type ApplyCleanupOptions struct {
 	Exclusions map[string]string `json:"exclusions"`
 }
 
-// compiledIgnoreRules is a map of pre-compiled selectors to their associated
-// JSON pointer paths, used to avoid recompiling selectors for each object.
-type compiledIgnoreRules map[*jsondiff.SelectorRegex][]string
-
 // DefaultApplyOptions returns the default apply options where force apply is disabled.
 func DefaultApplyOptions() ApplyOptions {
 	return ApplyOptions{
@@ -161,9 +157,9 @@ func (m *ResourceManager) Apply(ctx context.Context, object *unstructured.Unstru
 	patched = patched || patchedCleanupMetadata
 
 	// Compile ignore rules once for both drift detection and conditional field stripping.
-	var compiled compiledIgnoreRules
+	var compiled jsondiff.CompiledIgnoreRules
 	if existingObject.GetResourceVersion() != "" && len(opts.DriftIgnoreRules) > 0 {
-		compiled, err = compileIgnoreRules(opts.DriftIgnoreRules)
+		compiled, err = jsondiff.CompileIgnoreRules(opts.DriftIgnoreRules)
 		if err != nil {
 			return nil, err
 		}
@@ -219,10 +215,10 @@ func (m *ResourceManager) ApplyAll(ctx context.Context, objects []*unstructured.
 	driftedIgnorePaths := make([]jsondiff.IgnorePaths, len(objects))
 
 	// Compile ignore rules once for drift detection and conditional field stripping.
-	var compiled compiledIgnoreRules
+	var compiled jsondiff.CompiledIgnoreRules
 	if len(opts.DriftIgnoreRules) > 0 {
 		var err error
-		compiled, err = compileIgnoreRules(opts.DriftIgnoreRules)
+		compiled, err = jsondiff.CompileIgnoreRules(opts.DriftIgnoreRules)
 		if err != nil {
 			return nil, err
 		}
@@ -565,24 +561,10 @@ func (m *ResourceManager) shouldSkipApply(desiredObject *unstructured.Unstructur
 	return false, nil
 }
 
-// compileIgnoreRules compiles the selectors from the given ignore rules into
-// regular expressions. The compiled rules can be reused across multiple objects.
-func compileIgnoreRules(rules []jsondiff.IgnoreRule) (compiledIgnoreRules, error) {
-	sm := make(compiledIgnoreRules, len(rules))
-	for _, rule := range rules {
-		sr, err := jsondiff.NewSelectorRegex(rule.Selector)
-		if err != nil {
-			return nil, fmt.Errorf("failed to create ignore rule selector: %w", err)
-		}
-		sm[sr] = rule.Paths
-	}
-	return sm, nil
-}
-
 // removeIgnoredFields removes the fields matched by the given pre-compiled
 // ignore rules from obj. Selectors are evaluated against matchObj so that
 // existing and dry-run copies are stripped based on the same decision.
-func removeIgnoredFields(matchObj, obj *unstructured.Unstructured, rules compiledIgnoreRules) error {
+func removeIgnoredFields(matchObj, obj *unstructured.Unstructured, rules jsondiff.CompiledIgnoreRules) error {
 	var ignorePaths jsondiff.IgnorePaths
 	for sr, paths := range rules {
 		if sr.MatchUnstructured(matchObj) {
@@ -620,7 +602,7 @@ func lookupJSONPointer(obj *unstructured.Unstructured, pointer string) (any, boo
 // between existingObject and dryRunObject.
 func computeDriftedPaths(
 	existingObject, dryRunObject *unstructured.Unstructured,
-	rules compiledIgnoreRules,
+	rules jsondiff.CompiledIgnoreRules,
 ) jsondiff.IgnorePaths {
 	var drifted jsondiff.IgnorePaths
 	for sr, paths := range rules {
@@ -644,7 +626,7 @@ func computeDriftedPaths(
 // on the same decision, matching computeDriftedPaths.
 func (m *ResourceManager) hasDriftedWithIgnore(
 	existingObject, dryRunObject *unstructured.Unstructured,
-	compiled compiledIgnoreRules,
+	compiled jsondiff.CompiledIgnoreRules,
 ) (bool, error) {
 	if compiled == nil {
 		return m.hasDrifted(existingObject, dryRunObject), nil

Thanks @matheuscscp, Ok, no change in behavior, just refactor to remove 5 lines of duplicate code. The actual scenario described should be extremely rare and ok to live with given helm-controller has the same behavior.

Will take this change..

@matheuscscp
Copy link
Copy Markdown
Member

matheuscscp commented May 1, 2026

Actually, correcting the drift was a bad suggestion from me, sorry. By correcting the drift I meant correcting it on the desired object to be sent for apply, and not letting the new desired value override the one currently in the cluster (what we normally call drift correction). In other words, correct the drift on the desired state to make it match the actual state, and not the other way around like we normally do.

And yes, I think it's worth going down this route and looking at the .metadata.managedFields. Unfortunately we will need to parse fieldsV1 a little bit... But I tried it with Opus 4.7 and the code did not look huge. Would doing this also improve another scenario where we opted for a simpler solution? I'm wondering if we can run this parsing code only once and reuse the result for answering what is needed in the scenario we discussed here: #1157 (comment)

By the way, here is another observation Stefan made earlier today:

ignore rules and ownership removal (e.g. kubectl via ssa.ApplyCleanupOptions.FieldManagers) is not covered in tests. if a manager is supposed to be removed, what will happen?


The actual scenario described should be extremely rare

The amount of entropy in Flux user setups is unbelievable... One could argue that this is a long-time bug in helm-controller, if I set multiple ignore rules with an empty selector to target everything, then only the last one will be used. But we can definitely leave this for another day, I'm planning to open an issue in this repo to track it separately.

@matheuscscp
Copy link
Copy Markdown
Member

matheuscscp commented May 1, 2026

Please add this test for VPA, I think it's a very important one because:

  • It's the most important real-world use case
  • Tests a complex scenario for a single JSON pointer that satisfies two properties at the same time:
    • points to an object i.e. a non-leaf value (/resources)
    • points to a field of an object from a list of objects (/spec/template/spec/containers/0/...)

This would make sure all the path/pointer-processing code is well exercised 🙏

Click to see tests
// TestApply_DriftIgnoreRules_ContainerResources exercises the path-handling
// code with an ignore rule whose JSON pointer points to a non-leaf object
// inside a keyed list element — /spec/template/spec/containers/0/resources.
// This combination (array index + sub-object) is not covered elsewhere:
// existing tests use either a top-level leaf (/spec/replicas) or a leaf
// inside an array (/spec/template/spec/containers/0/image). The behavioral
// scenarios (release on other-manager claim, etc.) are already covered for
// those simpler paths; this test only adds the path shape that exercises
// lookupJSONPointer, GenerateRemovePatch + ApplyPatchToUnstructured, and
// hasOtherManagerForPath against an indexed sub-object claim under
// `k:{"name":...}` associative-list keys.
func TestApply_DriftIgnoreRules_ContainerResources(t *testing.T) {
	timeout := 30 * time.Second
	ctx, cancel := context.WithTimeout(context.Background(), timeout)
	defer cancel()

	id := generateName("drift-resources")
	objects, err := readManifest("testdata/test2.yaml", id)
	if err != nil {
		t.Fatal(err)
	}
	manager.SetOwnerLabels(objects, "app1", "default")
	if err := normalize.UnstructuredList(objects); err != nil {
		t.Fatal(err)
	}
	_, deployObject := getFirstObject(objects, "Deployment", id)

	opts := DefaultApplyOptions()
	opts.DriftIgnoreRules = []jsondiff.IgnoreRule{
		{
			Paths:    []string{"/spec/template/spec/containers/0/resources"},
			Selector: &jsondiff.Selector{Kind: "Deployment"},
		},
	}

	// vpaPayload returns a minimal SSA payload whose podinfod container has
	// the given resources block. Sending only `name` + `resources` for the
	// container limits the manager's claim to those fields under
	// containers[k:name=podinfod] in managedFields.
	vpaPayload := func(resources map[string]interface{}) *unstructured.Unstructured {
		container := map[string]interface{}{"name": "podinfod"}
		if resources != nil {
			container["resources"] = resources
		}
		return &unstructured.Unstructured{Object: map[string]interface{}{
			"apiVersion": "apps/v1",
			"kind":       "Deployment",
			"metadata": map[string]interface{}{
				"name":      id,
				"namespace": id,
			},
			"spec": map[string]interface{}{
				"selector": map[string]interface{}{
					"matchLabels": map[string]interface{}{"app": id},
				},
				"template": map[string]interface{}{
					"metadata": map[string]interface{}{
						"labels": map[string]interface{}{"app": id},
					},
					"spec": map[string]interface{}{
						"containers": []interface{}{container},
					},
				},
			},
		}}
	}

	containerResources := func(obj *unstructured.Unstructured) map[string]interface{} {
		t.Helper()
		containers, found, err := unstructured.NestedSlice(obj.Object, "spec", "template", "spec", "containers")
		if err != nil || !found || len(containers) == 0 {
			t.Fatalf("expected at least one container, got found=%v err=%v", found, err)
		}
		c0 := containers[0].(map[string]interface{})
		res, _ := c0["resources"].(map[string]interface{})
		return res
	}

	// Initial create with Flux owning containers[0].resources from test2.yaml.
	if _, err := manager.ApplyAllStaged(ctx, objects, opts); err != nil {
		t.Fatal(err)
	}

	// VPA force-claims containers[0].resources with different values. After
	// this, Flux co-owns or has been displaced; either way Flux must
	// release on the next reconcile because the path is in the ignore set.
	vpaResources := map[string]interface{}{
		"limits":   map[string]interface{}{"cpu": "4", "memory": "1Gi"},
		"requests": map[string]interface{}{"cpu": "2", "memory": "256Mi"},
	}
	if err := manager.client.Patch(ctx, vpaPayload(vpaResources), client.Apply,
		client.FieldOwner("vpa-controller"), client.ForceOwnership); err != nil {
		t.Fatalf("VPA force-claim failed: %v", err)
	}

	// Trigger a non-ignored drift so the apply path runs (and the strip
	// logic exercises GenerateRemovePatch + ApplyPatchToUnstructured for
	// the indexed sub-object pointer). hasOtherManagerForPath must walk
	// vpa-controller's managedFields entry across the
	// `k:{"name":"podinfod"}` associative key to detect the claim.
	if err := unstructured.SetNestedField(deployObject.Object, int64(7), "spec", "minReadySeconds"); err != nil {
		t.Fatal(err)
	}

	entry, err := manager.Apply(ctx, deployObject, opts)
	if err != nil {
		t.Fatalf("apply failed: %v", err)
	}
	if entry.Action != ConfiguredAction {
		t.Errorf("expected ConfiguredAction, got %s", entry.Action)
	}

	existing := deployObject.DeepCopy()
	if err := manager.client.Get(ctx, client.ObjectKeyFromObject(existing), existing); err != nil {
		t.Fatal(err)
	}

	// VPA's resource sub-object must be preserved end-to-end.
	res := containerResources(existing)
	if res == nil {
		t.Fatal("expected containers[0].resources to be set")
	}
	limits, _ := res["limits"].(map[string]interface{})
	if limits["cpu"] != "4" || limits["memory"] != "1Gi" {
		t.Errorf("expected VPA limits cpu=4 mem=1Gi to be preserved, got %v", limits)
	}
	requests, _ := res["requests"].(map[string]interface{})
	if requests["cpu"] != "2" || requests["memory"] != "256Mi" {
		t.Errorf("expected VPA requests cpu=2 mem=256Mi to be preserved, got %v", requests)
	}

	// Flux must no longer own resources under containers[k:name=podinfod].
	for _, mf := range existing.GetManagedFields() {
		if mf.Manager != manager.owner.Field || mf.Operation != metav1.ManagedFieldsOperationApply || mf.FieldsV1 == nil {
			continue
		}
		if strings.Contains(string(mf.FieldsV1.Raw), "\"f:resources\"") {
			t.Errorf("expected Flux to release containers[0].resources to vpa-controller")
		}
	}
}

@dipti-pai
Copy link
Copy Markdown
Member Author

Actually, correcting the drift was a bad suggestion from me, sorry. By correcting the drift I meant correcting it on the desired object to be sent for apply, and not letting the new desired value override the one currently in the cluster (what we normally call drift correction). In other words, correct the drift on the desired state to make it match the actual state, and not the other way around like we normally do.

And yes, I think it's worth going down this route and looking at the .metadata.managedFields. Unfortunately we will need to parse fieldsV1 a little bit.

@matheuscscp, Parsing the managed field is introducing fair bit of complexity, example -- fieldsV1 uses k:{"name":"value"} while JSON pointers use numeric indices. The merge key varies by list type and isn't available without the OpenAPI schema.

We could go with Option B. from this comment for 2.9 release, addressing the primary use-case for server-side apply based controllers (VPA, HPA, cert-manager). I am not opposed to making the change, but we could do it based on user interest/feedback in 2.10 rather than introducing additional complexity and error paths. Thoughts ?

@matheuscscp
Copy link
Copy Markdown
Member

I think modifying the behavior once the feature is released is one of:

  • A bug fix, if quickly shipped after the feature is released;
  • A breaking change, if shipped a long time after the feature is released;
  • A set of complex booelean feature gates, flags or spec fields that make the API hard to reason about;

I think we should aim for the right thing from the start, this is an opensource project with 3 releases per year and we have enough time until 2.9

@dipti-pai dipti-pai force-pushed the ssa-ignore-rules branch from e0775d7 to 68384c6 Compare May 5, 2026 18:56
@dipti-pai
Copy link
Copy Markdown
Member Author

Thanks @matheuscscp @stefanprodan for the reviews!

Actually, correcting the drift was a bad suggestion from me, sorry. By correcting the drift I meant correcting it on the desired object to be sent for apply, and not letting the new desired value override the one currently in the cluster (what we normally call drift correction). In other words, correct the drift on the desired state to make it match the actual state, and not the other way around like we normally do.

And yes, I think it's worth going down this route and looking at the .metadata.managedFields. Unfortunately we will need to parse fieldsV1 a little bit... But I tried it with Opus 4.7 and the code did not look huge. Would doing this also improve another scenario where we opted for a simpler solution? I'm wondering if we can run this parsing code only once and reuse the result for answering what is needed in the scenario we discussed here: #1157 (comment)

Added logic to parse managed fields and adopt the value when Flux is sole owner OR when there is no owner. Added a real-world test-case for VPA to cover JSON pointer parsing.

By the way, here is another observation Stefan made earlier today:

ignore rules and ownership removal (e.g. kubectl via ssa.ApplyCleanupOptions.FieldManagers) is not covered in tests. if a manager is supposed to be removed, what will happen?

Added this scenario as a test case. Flux adopts the in-cluster value because after cleanup, no other manager owned the field.

Comment thread ssa/manager_apply.go Outdated
…pply

Signed-off-by: Dipti Pai <diptipai89@outlook.com>
@dipti-pai dipti-pai force-pushed the ssa-ignore-rules branch from 68384c6 to bdf2d87 Compare May 6, 2026 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/server-side-apply SSA related issues and pull requests enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extend ssa.Apply with field ignore rules

3 participants