Skip to content

feat(storagebox): add Gateway API routing and replace MinIO with Garage#111

Open
adamancini wants to merge 4 commits intomainfrom
adamancini/gateway-api
Open

feat(storagebox): add Gateway API routing and replace MinIO with Garage#111
adamancini wants to merge 4 commits intomainfrom
adamancini/gateway-api

Conversation

@adamancini
Copy link
Member

@adamancini adamancini commented Feb 26, 2026

Summary

Replace ingress-nginx with Envoy Gateway as the Gateway API controller and replace MinIO with Garage for S3-compatible object storage.

Gateway API (Envoy Gateway)

Each application gets its own Gateway resource. Envoy Gateway provisions an independent Envoy proxy Deployment + NodePort Service per Gateway, providing full isolation.

Application Protocol Route Type Port
Garage S3 HTTP HTTPRoute 3900
PostgreSQL TCP TCPRoute 5432
Cassandra TCP TCPRoute 9042
rqlite HTTP HTTPRoute 4001
NFS NodePort (no Gateway API UDP support) multiple
  • Shared GatewayClass + EnvoyProxy resource configures NodePort for EC environments
  • Per-service gateway enable/disable toggles in KOTS admin console, nested under each service's settings group
  • Per-service TLS termination config for HTTP gateways (Garage, rqlite)
  • Envoy Gateway installed as EC extension via OCI chart (oci://docker.io/envoyproxy/gateway-helm v1.7.0), bundles all Gateway API CRDs including experimental TCPRoute

Garage S3 Storage (replaces MinIO)

  • Vendored subchart based on akkoma-helm's Garage implementation, upgraded to v1.3.1
  • Single StatefulSet, no operator dependency (removes MinIO operator from EC extensions)
  • Init container copies secrets to emptyDir with chmod 0600 (Kubernetes fsGroup adds group-read bits to secret volume mounts, but Garage requires exactly mode 0600)
  • Post-install/post-upgrade Helm hook Job using alpine:3.21 + curl + jq for Garage admin API calls:
    • Assigns cluster layout (1 GiB capacity)
    • Creates S3 access key and bucket
    • Stores credentials in a Kubernetes Secret
  • Helm test validates admin API health, S3 connectivity, bucket existence, and credentials Secret

Operational improvements

  • Support bundle: collectors and deploymentStatus/statefulsetStatus analyzers for all infrastructure (cert-manager, CNPG, Envoy Gateway, K8ssandra, cass-operator) and application components
  • Status informers: infrastructure deployments (cert-manager, cloudnative-pg, envoy-gateway, k8ssandra-operator, cass-operator) alongside conditional app informers
  • Builder key: static values enabling all components for air-gap image discovery via helm template
  • Preflights: NFS kernel module check upgraded from warn to fail
  • Images: consolidated all utility images to alpine:3.21 (removed busybox)
  • Helm timeout: 10m via helmUpgradeFlags
  • Makefile: vm-kubectl target for remote kubectl on EC VMs; removed minio-operator from test-install-operators
  • CI: updated workflow and smoke tests for Garage; gateway disabled in CI test values (no Envoy Gateway on CI clusters)

Patterns doc

New patterns/gateway-api/README.md covering per-application Gateway pattern, HTTPRoute/TCPRoute examples, EnvoyProxy/GatewayClass infrastructure, TLS termination, and KOTS integration. Notes TCPRoute experimental status is point-in-time.

Test plan

  • helm lint passes
  • helm template renders all resources with all components enabled
  • make validate-config four-way contract passes
  • EC headless install on CMX VM (v0.26.8)
  • Garage StatefulSet starts with correct secret permissions (init container)
  • Garage setup Job completes: layout assigned, bucket created, credentials stored
  • All infrastructure pods healthy (cert-manager, CNPG, Envoy Gateway, K8ssandra)
  • All application pods healthy (Garage, PostgreSQL, Cassandra, rqlite)
  • Envoy Gateway provisions per-application proxy pods
  • CI helm-install-test (pending with Garage v1.3.1 fixes)

Replace ingress-nginx with Envoy Gateway as the Gateway API controller,
installed as an EC extension via OCI chart. Each application gets its own
Gateway resource with an independent Envoy proxy instance:

- Garage S3: HTTP Gateway + HTTPRoute (port 3900)
- PostgreSQL: TCP Gateway + TCPRoute (port 5432)
- Cassandra: TCP Gateway + TCPRoute (port 9042)
- rqlite: HTTP Gateway + HTTPRoute (port 4001)
- NFS: stays on NodePort (Gateway API does not support UDP)

Replace MinIO operator + Tenant subchart with Garage v1.3.1, a
lightweight S3-compatible object storage that runs as a single
StatefulSet with no operator dependency. A post-install/post-upgrade
Helm hook Job handles cluster layout assignment, bucket creation, and
S3 credential provisioning via the Garage admin API. An init container
copies secrets to an emptyDir with mode 0600 to satisfy Garage's strict
file permission requirements.

Also includes:
- Per-service gateway and TLS settings in KOTS admin console config
- Helm test for Garage connectivity and S3 round-trip verification
- Support bundle collectors and deployment health analyzers for all
  infrastructure (cert-manager, CNPG, Envoy Gateway, K8ssandra)
- Status informers for infrastructure deployments
- Builder key for air-gap image discovery
- NFS kernel module preflight upgraded to hard fail
- Consolidated all utility images to alpine:3.21 (removed busybox)
- vm-kubectl Makefile target for remote kubectl on EC VMs
- Updated CI workflow and smoke tests for Garage
Covers per-application Gateway pattern with Envoy Gateway, HTTPRoute
for S3/HTTP services, TCPRoute for databases, GatewayClass/EnvoyProxy
infrastructure, TLS termination, and KOTS config integration. All
examples drawn from the storagebox application. Notes that TCPRoute's
experimental status is point-in-time (February 2026) and that Traefik
supports TCPRoute when experimental CRDs are installed separately.
@adamancini adamancini force-pushed the adamancini/gateway-api branch from 7ffd5b6 to afc198b Compare February 27, 2026 18:51
Copy link
Member

@scottrigby scottrigby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see a Gateway API pattern!

This PR looks great, except for one question (below)

adamancini and others added 2 commits March 2, 2026 12:35
…ctly

Remove Kubernetes API calls from the helm test pod. Instead of fetching
the S3 credentials Secret via the K8s API with SA token + CA cert, mount
it directly as a volume. This eliminates the serviceAccountName, KUBE_API,
SA_TOKEN, and CA_CERT plumbing that was confusing two auth contexts
(Garage app-level auth vs K8s API auth).
@jmboby
Copy link
Member

jmboby commented Mar 11, 2026

@adamancini did you mean to have a chef-360 reference in the backup.yaml? It's commented out but still worth asking.

image

@jmboby
Copy link
Member

jmboby commented Mar 11, 2026

@adamancini should we use the latest Replicated SDK 1.17.0 ?

apiVersion: v2
appVersion: 1.0.0
dependencies:

  • condition: replicated.enabled
    name: replicated
    repository: oci://registry.replicated.com/library
    version: ~1.12.2

I believe the ~ only allows patch level updates - z release.

@jmboby
Copy link
Member

jmboby commented Mar 11, 2026

@adamancini I'm having issues deploying to EC and I'm not sure where these minio references are coming from. I have switched to your PR branch and created a new release. It could be something my side but just wanted to check?

{"level":"info","ts":"2026-03-11T02:45:50Z","msg":"stdout (helm install) = Release \"storagebox\" does not exist. Installing it now.\n"}
{"level":"info","ts":"2026-03-11T02:45:50Z","msg":"stderr (helm install) = Error: unable to build kubernetes objects from release manifest: resource mapping not found for name: \"myminio\" namespace: \"\" from \"\": no matches for kind \"Tenant\" in version \"minio.min.io/v2\"\nensure CRDs are installed first\n"}

All the infra Helm chart deploys are fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants