Skip to content

docs(networking): document publishing.proxyProtocol + ouroboros hairpin-NAT fix#527

Merged
kvaps merged 1 commit intomainfrom
docs/networking-ouroboros
May 7, 2026
Merged

docs(networking): document publishing.proxyProtocol + ouroboros hairpin-NAT fix#527
kvaps merged 1 commit intomainfrom
docs/networking-ouroboros

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei lexfrei commented May 3, 2026

What this PR does

Adds content/en/docs/next/networking/hairpin-proxy-protocol.md to document the new publishing.proxyProtocol flag + the bundled ouroboros hairpin-NAT fix that lands in cozystack/cozystack#2561.

The page covers:

  • The hairpin-NAT problem when ingress-nginx requires PROXY-protocol but kube-proxy / Cilium-KPR short-circuits the LB IP for intra-cluster lookups.
  • Why KEP-1860 ipMode=Proxy is not a fit for the cozystack default stack (Cilium + kubeProxyReplacement: true drops the LB frontend entirely when ipMode != VIP).
  • ouroboros itself (Go reimplementation of compumike/hairpin-proxy — see lexfrei/ouroboros) and the host-vs-tenant mode split (coredns mutates the Talos-managed Corefile in place; coredns-import writes plugin-only snippets into a separate coredns-custom ConfigMap that the cozystack-coredns chart imports).
  • The single platform flag publishing.proxyProtocol: true (host scope), the per-tenant addons.ouroboros.enabled knob, and how PROXY-protocol gets wired onto tenant ingress-nginx via valuesOverride.
  • The upstream-LB precondition (cozystack cannot configure the cloud LB / F5 / hcloud-CCM that prepends the PROXY header — the operator owns that).
  • The Talos re-render flap window during a machine-config push.
  • The disable hazard on both layers (the upstream chart only has a cleanup hook for external-dns mode), the lookup-based acknowledge gates that catch the obvious wrong path, the known hole when an operator manually deletes the Package / HelmRelease before flipping the flag back, and the JSON-patch cleanup recipes (host + tenant).
  • Air-gap mirroring guidance: chart and image come from lexfrei/ouroboros, not mirrored under ghcr.io/cozystack/*.

Pairs with cozystack/cozystack#2561.

Release note

NONE

Summary by CodeRabbit

  • Documentation
    • Added comprehensive guidance on PROXY-protocol support and the hairpin-NAT fix, including enablement/disable workflows, per-tenant wiring, cleanup fallbacks, deployment modes, preconditions, and air-gapped operator guidance.
    • Updated Platform Package reference with new publishing configuration options for exposure, certificate solver/issuer, and proxy-protocol settings (including acknowledgement flag).

@netlify
Copy link
Copy Markdown

netlify Bot commented May 3, 2026

Deploy Preview for cozystack ready!

Name Link
🔨 Latest commit 52d75b4
🔍 Latest deploy log https://app.netlify.com/projects/cozystack/deploys/69fbaec5d92846000793cbf7
😎 Deploy Preview https://deploy-preview-527--cozystack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 63249eeb-092f-4a23-a3da-8b0564d0166b

📥 Commits

Reviewing files that changed from the base of the PR and between 412a315 and 52d75b4.

📒 Files selected for processing (2)
  • content/en/docs/next/networking/hairpin-proxy-protocol.md
  • content/en/docs/next/operations/configuration/platform-package.md

📝 Walkthrough

Walkthrough

Adds a new documentation page explaining PROXY-protocol support and the ouroboros hairpin-NAT fix, and updates the Platform Package reference to document new publishing fields related to PROXY handling.

Changes

PROXY-Protocol & Platform Fields Documentation

Layer / File(s) Summary
Front Matter & Overview
content/en/docs/next/networking/hairpin-proxy-protocol.md
Adds page front matter (title, linkTitle, description, weight) and opening "What this page covers".
Problem & Architecture
content/en/docs/next/networking/hairpin-proxy-protocol.md
Describes why PROXY-protocol breaks intra-cluster traffic and outlines ouroboros (controller + TCP proxy) dual-layer fix.
Host Enablement & Preconditions
content/en/docs/next/networking/hairpin-proxy-protocol.md
Adds host enablement YAML, upstream LB preconditions for PROXY header injection, and Talos re-render timing notes.
CoreDNS / DNS Handling
content/en/docs/next/networking/hairpin-proxy-protocol.md
Documents the cozystack-coredns wrapper and cluster DNS domain behavior for tenants.
Per-Tenant Enablement & Wiring
content/en/docs/next/networking/hairpin-proxy-protocol.md
Adds tenant YAML example, required/recommended config keys, addon wiring guidance, and caveats.
Disable Paths & Cleanup Recipes
content/en/docs/next/networking/hairpin-proxy-protocol.md
Adds host/tenant disable sequencing, operator guardrails, and manual fallback cleanup patches/steps for host and tenant.
Air-gapped Guidance & Rationale
content/en/docs/next/networking/hairpin-proxy-protocol.md
Adds instructions for mirroring charts/images in air-gapped environments and rationale for not enabling KEP-1860 by default.
Platform Values Reference
content/en/docs/next/operations/configuration/platform-package.md
Inserts new publishing fields: publishing.exposure, publishing.certificates.solver, publishing.certificates.issuerName, publishing.proxyProtocol, and publishing.proxyProtocolAcknowledgeUnclean; reorders certificate entries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A rabbit sketches network trails,
Ouroboros hums where proxy fails,
Docs stitched warm in cozystack's lair,
Tenants and hosts breathe easier there,
Hop, fix, and patch — a cozy pair 🐰

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and specifically summarizes the main changes: adding documentation for the publishing.proxyProtocol feature and the ouroboros hairpin-NAT fix in networking.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/networking-ouroboros

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces documentation for the PROXY-protocol and the hairpin-NAT fix using the ouroboros component in Cozystack. It explains the technical challenges of intra-cluster traffic when PROXY-protocol is enabled and provides detailed instructions for host and tenant-level configurations, including cleanup recipes. Feedback was provided to address a technical inaccuracy regarding how to verify PROXY-protocol headers and to improve the reliability of the cleanup script's regex matching.

Comment on lines +47 to +53
The flag does not configure the L4 load balancer in front of ingress-nginx for you — that lives outside the cluster (cloud LB, F5, MetalLB+haproxy, hcloud-cloud-controller-manager, …). The upstream LB **must** already be injecting PROXY-protocol v1 headers before the flag flips on. Verify with:

```bash
nc -v <lb-public-ip> 443 <<<""
```

and look for the leading `PROXY TCP4 …\r\n` frame in the response. Without that, every external request to ingress-nginx breaks the moment the flag flips on.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The verification step using nc to look for a PROXY header in the response is likely incorrect for standard PROXY protocol implementations. The PROXY protocol header is sent by the proxy (the Load Balancer) to the upstream server (ingress-nginx), not back to the client. An external client connecting to the LB would not see this header in the response. To verify that the LB is correctly injecting the header, it is recommended to check the ingress-nginx controller logs for successful PROXY protocol header parsing or use a tool like tcpdump on the ingress nodes to see the incoming traffic from the LB.

existing=$(kubectl --namespace kube-system get configmap coredns \
--output jsonpath='{.data.Corefile}')
cleaned=$(printf '%s\n' "$existing" \
| sed '/# === BEGIN ouroboros/,/# === END ouroboros ===/d')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sed command might fail to match the end marker if it doesn't exactly match the string used by the controller. Line 31 indicates the markers include (do not edit by hand). The regex /# === END ouroboros ===/ will not match a line like # === END ouroboros (do not edit by hand) === because of the trailing === in the regex. It's safer to use a more flexible match that covers both cases.

Suggested change
| sed '/# === BEGIN ouroboros/,/# === END ouroboros ===/d')
| sed '/# === BEGIN ouroboros/,/# === END ouroboros/d')

@lexfrei lexfrei force-pushed the docs/networking-ouroboros branch 4 times, most recently from bf08d0c to 7fbc317 Compare May 4, 2026 10:29
@lexfrei lexfrei added the documentation Improvements or additions to documentation label May 4, 2026
@lexfrei lexfrei self-assigned this May 4, 2026
@lexfrei lexfrei force-pushed the docs/networking-ouroboros branch 8 times, most recently from db10a0a to 412a315 Compare May 6, 2026 03:48
@lexfrei lexfrei marked this pull request as ready for review May 6, 2026 04:56
@lexfrei lexfrei requested review from kvaps and lllamnyp as code owners May 6, 2026 04:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
content/en/docs/next/networking/hairpin-proxy-protocol.md (1)

62-62: ⚡ Quick win

Consider breaking up the 200+ word sentence for better readability.

The technical content about the cozystack-coredns wrapper is accurate and important, but the single-sentence structure (spanning from "cozystack ships" through the test assertion) makes it difficult to parse. Consider restructuring into 3-4 shorter sentences: (1) wrapper overview with import directive, (2) Helm three-way merge behavior with notExists: data, (3) why this matters for tenant vs host deployment modes, (4) the test assertion safeguard.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/en/docs/next/networking/hairpin-proxy-protocol.md` at line 62, The
long single sentence should be split into 3–4 shorter sentences to improve
readability: (1) describe the wrapper overview and mention the Corefile import
directive and the coredns-custom ConfigMap mounting, (2) explain Helm three-way
merge behavior and the fact the wrapper renders the ConfigMap with no data:
field so apiserver-side ouroboros.adds survive chart upgrades (reference
ouroboros and helm.sh/resource-policy: keep), (3) note the tenant vs host
deployment difference (reference coredns-import and coredns modes and that host
CoreDNS is Talos-managed), and (4) call out the test invariant in
packages/system/coredns/tests/coredns_custom_test.yaml asserting notExists:
data. Keep sentences short and sequential so each point is clear.
content/en/docs/next/operations/configuration/platform-package.md (1)

66-66: 💤 Low value

Consider restructuring the long publishing.exposure description for better readability.

The description is comprehensive but exceeds 150 words in a single table cell, making it challenging to scan. Consider breaking the deprecation timeline and requirements into a bulleted sub-list or shorter sentences. The technical content and KEP-5707 warning are valuable and accurate.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/en/docs/next/operations/configuration/platform-package.md` at line
66, The table cell for the publishing.exposure description is too long and
should be broken into clearer, shorter pieces: split the single long sentence
about exposure modes, deprecation (KEP-5707) and requirements into 2–3 short
sentences and move the operational requirements into a compact bullet/sub-list
(e.g., mode behavior, deprecation timeline, and prerequisites for loadBalancer
such as Cilium L2/BGP and at least one publishing.externalIPs entry) so the cell
reads as a concise summary followed by a short list of requirements; keep the
identifiers publishing.exposure, publishing.externalIPs, loadBalancer,
externalIPs and KEP-5707 intact for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@content/en/docs/next/networking/hairpin-proxy-protocol.md`:
- Line 62: The long single sentence should be split into 3–4 shorter sentences
to improve readability: (1) describe the wrapper overview and mention the
Corefile import directive and the coredns-custom ConfigMap mounting, (2) explain
Helm three-way merge behavior and the fact the wrapper renders the ConfigMap
with no data: field so apiserver-side ouroboros.adds survive chart upgrades
(reference ouroboros and helm.sh/resource-policy: keep), (3) note the tenant vs
host deployment difference (reference coredns-import and coredns modes and that
host CoreDNS is Talos-managed), and (4) call out the test invariant in
packages/system/coredns/tests/coredns_custom_test.yaml asserting notExists:
data. Keep sentences short and sequential so each point is clear.

In `@content/en/docs/next/operations/configuration/platform-package.md`:
- Line 66: The table cell for the publishing.exposure description is too long
and should be broken into clearer, shorter pieces: split the single long
sentence about exposure modes, deprecation (KEP-5707) and requirements into 2–3
short sentences and move the operational requirements into a compact
bullet/sub-list (e.g., mode behavior, deprecation timeline, and prerequisites
for loadBalancer such as Cilium L2/BGP and at least one publishing.externalIPs
entry) so the cell reads as a concise summary followed by a short list of
requirements; keep the identifiers publishing.exposure, publishing.externalIPs,
loadBalancer, externalIPs and KEP-5707 intact for clarity.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 073192ff-dc56-47b4-835e-8bb2a1816d48

📥 Commits

Reviewing files that changed from the base of the PR and between fdd8855 and 412a315.

📒 Files selected for processing (2)
  • content/en/docs/next/networking/hairpin-proxy-protocol.md
  • content/en/docs/next/operations/configuration/platform-package.md

New page under content/en/docs/next/networking/ explaining how to
turn on PROXY-protocol at the cozystack host ingress-nginx and the
bundled ouroboros component that fixes the resulting hairpin-NAT
problem (intra-cluster traffic to the cluster's own public hostnames
arriving at ingress-nginx without the PROXY header it now requires).

Covers:

- The hairpin-NAT failure mode: kube-proxy / Cilium-KPR short-circuits
  the LB IP for in-cluster lookups, the connection bypasses the
  upstream LB and arrives at ingress-nginx without a PROXY header,
  ingress-nginx closes the connection.
- Why KEP-1860 ipMode=Proxy is not a fit for the cozystack default
  stack (Cilium + kubeProxyReplacement drops the LB frontend entirely
  when ipMode != VIP, breaking Service ingress for L2/BGP-announced
  IPs).
- ouroboros itself (a Go reimplementation of compumike/hairpin-proxy)
  and the host vs tenant mode split: coredns mode mutates the
  Talos-managed Corefile in place on the host, coredns-import mode
  writes plugin-only snippets into a separate coredns-custom ConfigMap
  that the cozystack-coredns Corefile imports.
- The single platform flag publishing.proxyProtocol: true (host
  scope), the per-tenant addons.ouroboros knob, and how PROXY-protocol
  gets wired onto tenant ingress-nginx via valuesOverride (unlike the
  host flag, the tenant addon does not auto-wire it because tenants
  commonly have different upstream-LB setups).
- The upstream-LB precondition: the L4 LB in front of ingress-nginx
  must already be injecting PROXY-v1 headers before the flag flips
  on. Two practical verification recipes (tcpdump on the
  ingress-nginx node before flipping, or ingress-nginx logs after
  the LB-side change) — `nc` from a laptop will never see a frame
  because PROXY-protocol travels LB-to-backend, not back to clients.
- The Talos re-render flap window: a machine-config push wipes the
  BEGIN/END markers in the live host Corefile and the controller
  re-applies on next reconcile, leaving a brief intra-cluster DNS
  failure window during the roll.
- The disable hazard on both layers (no cleanup hook for coredns or
  coredns-import mode in the upstream chart), the lookup-based
  acknowledge gates that catch the obvious wrong path, the known
  hole when an operator manually deletes the Package or HelmRelease
  before flipping the flag back, and the JSON-patch cleanup recipes
  (host: jq-built merge-patch on the live Corefile that preserves
  labels, annotations, and Talos-managed metadata; tenant: null the
  ouroboros.override key in coredns-custom).
- Air-gapped operators must mirror two extra non-cozystack registry
  locations: oci://ghcr.io/lexfrei/charts/ouroboros and
  ghcr.io/lexfrei/ouroboros — this package is intentionally not
  mirrored under ghcr.io/cozystack/*.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei lexfrei force-pushed the docs/networking-ouroboros branch from 412a315 to 52d75b4 Compare May 6, 2026 21:12
kvaps added a commit to cozystack/cozystack that referenced this pull request May 7, 2026
…2561)

## What this PR does

Adds
[`lexfrei/ouroboros@v0.7.0`](https://github.com/lexfrei/ouroboros/releases/tag/v0.7.0)
— a Go reimplementation of
[`compumike/hairpin-proxy`](https://github.com/compumike/hairpin-proxy)
— and wires it into cozystack as a *conditional default*: a single
`publishing.proxyProtocol: true` switch turns PROXY-protocol on at the
host ingress-nginx **and** auto-deploys ouroboros to fix the resulting
hairpin-NAT problem (intra-cluster traffic to the cluster's own public
hostnames arriving at ingress-nginx without the PROXY header it now
requires). Default stays `false` — clusters that do not run
PROXY-protocol get zero new resources, zero RBAC delta, zero behaviour
change.

The chart is platform-agnostic at v0.7.0: the controller and proxy
binaries auto-detect the cluster's DNS domain from `/etc/resolv.conf`
inside the pod and compose the rewrite target FQDN and the backend host
at runtime as `<svc>.<ns>.svc.<cluster-domain>`. Cozystack tenants on
`cozy.local`, kubeadm-bootstrapped clusters on `cluster.local`,
federations on `k8s.example.com`, custom-domain RKE2/k3s — all work
without operator override. Verified end-to-end on a `cozy.local`
cluster: controller logs `cluster-domain=cozy.local` at startup, chart
emits `--proxy-service-name + --proxy-service-namespace` (no chart-baked
`--proxy-fqdn`), Corefile picks up the rewrite block between BEGIN/END
markers, in-pod `dig <ingress-host>` returns the ouroboros-proxy
ClusterIP. Operators that prefer a chart-baked FQDN can still set
`controller.clusterDomain` explicitly — the explicit-override path is
preserved for non-Service backends and for split-horizon DNS topologies.

The chart runs in `coredns` mode for the host (the host CoreDNS is
Talos-managed and does not carry an `import` directive, so the
alternative would silently no-op there) and in `coredns-import` mode for
tenant deployments (tenants run cozystack-coredns, which carries the
import directive and ships an empty `kube-system/coredns-custom`
ConfigMap; ouroboros writes plugin-only `rewrite name` snippets into the
`ouroboros.override` data key without ever touching the chart-rendered
Corefile, so it does not race the chart that re-templates
`data.Corefile` on every Flux reconcile).

### Layout

* `packages/system/ouroboros/` — vendored chart from
`oci://ghcr.io/lexfrei/charts/ouroboros@0.7.0`. Cozystack-side defaults
pinned in `values.yaml`: `controller.mode=coredns` (host scenario),
`proxy.target.namespace=cozy-ingress-nginx` (override the chart-default
`ingress-nginx` namespace; chart-default
`serviceName=ingress-nginx-controller` matches as-is), Gateway-API
watching left at upstream's default of `false`. `proxy.target.host` is
left empty so the proxy composes the backend host at runtime. Image is
digest-pinned at
`0.7.0@sha256:478665e05dd0ffc4c1f7764320ea24b52251781884ddfd668d93a03e39d9094c`
and pulled directly from `ghcr.io/lexfrei/ouroboros` — this package is
intentionally not mirrored under `ghcr.io/cozystack/*`. The chart
manifest itself is also digest-pinned via
`OUROBOROS_CHART_DIGEST=sha256:f28bfac9fd7070b1f3357983ff5aad28c16115e7de1cfc16ee568a3b4cfc9d7e`
in the package Makefile, with a Chart.yaml-version vs. Makefile-version
assert that aborts the update target on lockstep drift. Helm-unittest in
`tests/` covers controller args, RBAC narrowing per mode, Gateway-API
gating, runtime-composition flag emission in both `coredns` and
`coredns-import` modes, image pin drift.
* `packages/system/coredns/templates/coredns-custom.yaml` +
`values.yaml` — empty `coredns-custom` ConfigMap as the chart-managed
`import` prerequisite, plus the `import /etc/coredns/custom/*.override`
plugin in the `.:53` server block + `extraVolumes`/`extraVolumeMounts`
mount at `/etc/coredns/custom/`. Same package serves host and tenant
CoreDNS, so this covers both layers without duplication. Helm-unittest
pins the rendered Corefile via a snapshot so future `make update` does
not silently drop the import directive. The empty rendered ConfigMap
deliberately omits `data:` — Helm three-way merge has no template-side
keys to compare against, so apiserver-side runtime writes (the
`ouroboros.override` data key) survive every chart upgrade. The
`notExists: data` assertion in `tests/coredns_custom_test.yaml` pins the
invariant.
* `packages/core/platform/sources/ouroboros.yaml` — new
`cozystack.ouroboros` PackageSource for the host install (`dependsOn:
cozystack.networking, cozystack.ingress-application`).
* `packages/core/platform/sources/kubernetes-application.yaml` —
`kubernetes-ouroboros` component registered against `system/ouroboros`
so the tenant HelmRelease can reference it.
* `packages/apps/kubernetes/templates/helmreleases/ouroboros.yaml` —
tenant-side HelmRelease deployed via the tenant admin-kubeconfig,
mirroring the cert-manager pattern. A `cozystack.defaultOuroborosValues`
helper flips `mode` back to `coredns-import` for tenants and mirrors
per-tenant `addons.gatewayAPI.enabled` into the controller (forcing
`--gateway-api` on a tenant without Gateway-API CRDs CrashLoops the
controller on `WaitForCacheSync`). `controller.clusterDomain` is
intentionally NOT pre-set — chart 0.7.0 auto-detects per-pod from
`/etc/resolv.conf` and composes the rewrite target at runtime, so the
same wrapper renders correctly on `cluster.local`, `cozy.local`, or any
custom tenant cluster-domain. Operators on a non-default cluster-domain
can still override via
`valuesOverride.ouroboros.controller.clusterDomain`. `dependsOn` grows a
conditional `gateway-api-crds` entry on the same gate. If
`addons.ouroboros.enabled` is true while `addons.ingressNginx.enabled`
is false, the HelmRelease fails the render with a clear error rather
than silently skipping — the hairpin fix has nothing to fix against
without ingress-nginx, and a no-op addon that quietly does nothing is
harder to debug than an explicit render-time error. The valuesOverride
gateway-api guard accepts every truthy spelling Helm/YAML emit (bool
`true`, string `"true"`/`"True"`/`"TRUE"`, numeric `1`, YAML
`yes`/`Yes`/`YES`/`on`/`On`/`ON`/`y`/`Y`) via
`toString`-then-`lower`-then-string-compare; direct bool `eq` would
error on type mismatch when an operator passes the value as a string
from `--set`, and case-insensitive folding catches `--set ...=On` /
`--set ...=True` that pure-lowercase comparison would miss. Tenant
disable is single-step: flipping `addons.ouroboros.enabled: false`
deletes the rendered HelmRelease, helm-controller runs `helm uninstall`
on the tenant, and the chart's pre-delete cleanup hook
(`coredns-cleanup-hook.yaml`, vendored at 0.7.0) nulls the
`ouroboros.override` key on the tenant's `kube-system/coredns-custom`
ConfigMap automatically. No render-time guard is wired on the tenant
side — gating on a `lookup` for the leftover HelmRelease would deadlock
the routine disable (the parent render runs before helm-controller
applies the missing-child diff, so the lookup would always find the
leftover at the moment of the flag flip). If the chart's hook fails to
land (controller pod stuck CrashLooping, ConfigMap RBAC drift, Job
timeout), operators recover manually via the documented tenant cleanup
recipe.
* `packages/apps/kubernetes/values.yaml` — new `OuroborosAddon` typedef
with `enabled` + `valuesOverride`. `make generate` re-emitted README,
JSON schema, the Go struct + deepcopy, and the kubernetes-rd
ApplicationDefinition openAPISchema in lockstep.
* `packages/core/platform/values.yaml` — new `publishing.proxyProtocol`
flag with extensive comment covering the upstream-LB precondition (the
L4 LB in front of ingress-nginx must already be injecting PROXY-v1
headers before the flag flips on, otherwise external traffic breaks),
the host-vs-tenant fan-out (the flag does not cascade into tenants), the
Talos re-render flap window, and the asymmetric disable path. The host
disable path: flipping back to `false` reverts ingress-nginx
PROXY-protocol config (flip the LB off first), but the host ouroboros
itself stays installed because its Package CR carries
`helm.sh/resource-policy: keep` — an explicit `kubectl delete
package.cozystack.io cozystack.ouroboros` (or `bundles.disabledPackages`
entry) actually triggers helm uninstall, at which point the chart's
pre-delete cleanup hook fires and patches the host Corefile
automatically. The values comment ships a manual `sed`-based cleanup
recipe as a fallback for the rare case where the chart's hook fails to
land (controller stuck in CrashLoop, ConfigMap RBAC drift, hook
timeout); the `publishing.proxyProtocolAcknowledgeUnclean` gate keys off
a `lookup` against the leftover Package CR to catch operators who flip
the flag back without confirming the cleanup landed.
* `packages/core/platform/templates/bundles/system.yaml` — flips between
the default-form and explicit-form `cozystack.ingress-application`
Package emission so the host ingress-nginx gets `use-proxy-protocol` +
`real-ip-header=proxy_protocol` injected only when the flag is true
(`use-forwarded-headers` and `compute-full-forwarded-for` are
deliberately *not* on the list — they would let any upstream proxy spoof
`X-Forwarded-For`); the same conditional emits `cozystack.ouroboros`.
Host Gateway-API watching is left at the upstream chart's default of
`false` because cozystack does not install Gateway-API CRDs cluster-wide
on the host.
* `hack/e2e-apps/ouroboros.bats` — auto-discovered smoke test for the
tenant addon (HelmRelease ready, `coredns-custom` ConfigMap present in
tenant, controller pod running, and after creating a TLS Ingress the
rewrite snippet shows up in the import ConfigMap; an in-tenant `dig`
from a netshoot pod also confirms CoreDNS actually serves the rewrite —
pod-readiness alone passes vacuously since the upstream chart ships no
readiness probes).

### Why not KEP-1860 `ipMode=Proxy`

Cilium's `kubeProxyReplacement: true` (cozystack default) drops the LB
frontend entirely when `ipMode != VIP`, breaking Service ingress for
L2/BGP-announced IPs. KEP-1860's Proxy mode is reserved for clusters
fronted by an external CCM-managed proxy LB. ouroboros is the correct
fix for L2/BGP-announce topologies that cozystack ships by default.

### Coverage

Covered by helm-unittest (host chart, coredns wrapper, tenant addon —
including the `kubernetesProvider` mock that exercises the lookup-driven
`proxyProtocolAcknowledgeUnclean` gate, the lookup-returns-nil
CI/dry-run path, the gateway-api truthy-spelling acceptance across
`true`/`"true"`/`"True"`/`"TRUE"`/`1`/`yes`/`on`, the assertion that
`--proxy-service-name` + `--proxy-service-namespace` are emitted instead
of a chart-baked `--proxy-fqdn` in both `coredns` and `coredns-import`
modes, the assertion that `--target-service-name` +
`--target-service-namespace` are emitted for the proxy backend, and the
`_namespace.etcd`-empty render gate) and a tenant e2e bats test that
creates a real Ingress, asserts the rewrite snippet lands in the import
ConfigMap, and confirms in-tenant DNS serves it (with an in-pod retry
loop tolerating CoreDNS reload latency). The host-level
`publishing.proxyProtocol: true` path is not e2e-tested in this PR — it
requires a PROXY-protocol-aware upstream LB, which the cozystack e2e
harness does not stand up. That path is exercised manually and by docs
review.

End-to-end runtime auto-detect was verified manually on a `cozy.local`
cluster (dev17): chart deployed at v0.7.0, controller logged
`cluster-domain=cozy.local`, the rewrite block landed in
`kube-system/coredns` Corefile pointing at
`test-ouroboros-proxy.test-ouroboros.svc.cozy.local.`, an in-pod `dig
<ingress-host>` resolved to the ouroboros-proxy ClusterIP on the first
attempt, and the chart's pre-delete hook restored the Corefile cleanly
on `helm uninstall`.

### Supply-chain note

Image and chart are pulled from a contributor's personal namespace
(`ghcr.io/lexfrei/...`). Existing precedent
(`packages/system/etcd-operator/Makefile` pulls from
`ghcr.io/aenix-io/...`) is similar but `aenix-io` is the
cozystack-adjacent org while `lexfrei` is one human. Worth a maintainer
call.

Pairs with cozystack/website#527 (operator docs).

### Release note

```release-note
feat(networking): add `publishing.proxyProtocol: true` to enable PROXY-protocol on host ingress-nginx and auto-deploy ouroboros (a Go reimplementation of compumike/hairpin-proxy) to solve the resulting hairpin-NAT problem; tenants can opt in per-cluster via `addons.ouroboros.enabled` (requires `addons.ingressNginx.enabled`). Default behaviour unchanged.
```


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added Ouroboros addon to address hairpin‑NAT for ingress‑nginx with
configurable enablement and tenant overrides.
* Added a packaged Ouroboros chart providing proxy, controller, CoreDNS
integration, external‑DNS and etc‑hosts modes.
* Introduced PROXY‑protocol gating with an explicit acknowledgement
flag.

* **Documentation**
* Expanded addon reference and platform docs with detailed
Ouroboros/PROXY‑protocol configuration and defaults.

* **Tests**
* Added extensive unit and e2e tests covering rendering, RBAC, uninstall
cleanup, and DNS behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Copy link
Copy Markdown
Member

@kvaps kvaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Companion docs to cozystack/cozystack#2561 which just landed.

@kvaps kvaps merged commit 5c30552 into main May 7, 2026
6 checks passed
@kvaps kvaps deleted the docs/networking-ouroboros branch May 7, 2026 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants