Skip to content

fix(permission): clean up leftover access in SpiceDB when a permission is deleted#1685

Open
whoAbhishekSah wants to merge 1 commit into
mainfrom
fix/permission-delete-cascade
Open

fix(permission): clean up leftover access in SpiceDB when a permission is deleted#1685
whoAbhishekSah wants to merge 1 commit into
mainfrom
fix/permission-delete-cascade

Conversation

@whoAbhishekSah

@whoAbhishekSah whoAbhishekSah commented Jun 8, 2026

Copy link
Copy Markdown
Member

Background: how permissions are stored

A permission lives in three places:

  • permissions table (Postgres) — one row per permission.
  • roles table (Postgres) — each role keeps a list of the permission slugs it grants.
  • SpiceDB — the grant tuples that checks actually read, one per principal type:
app/role:4f6e2b8a-1c9d-4b2e-9a77-2b0f5c1e8a31#compute_order_delete@app/user:*
app/role:4f6e2b8a-1c9d-4b2e-9a77-2b0f5c1e8a31#compute_order_delete@app/serviceuser:*
app/role:4f6e2b8a-1c9d-4b2e-9a77-2b0f5c1e8a31#compute_order_delete@app/pat:*

(format: app/role:<role-id>#<permission-slug>@<principal>:*)

Rows in the Postgres permissions table are written by exactly one function — bootstrap's AppendSchema (via migrateServiceDefinitionToDB). It runs at startup (MigrateSchema, from base schema + config) and from the CreatePermission API. On every startup the whole SpiceDB schema is rebuilt from three things: the base schema, the config files, and the permissions table. A namespace (object type) shows up in that schema only if it still has at least one permission. So if you delete a namespace's last permission, the namespace disappears from the schema on the next startup. Example: compute/machine has one permission compute/machine:scrub; delete that permission and compute/machine is no longer in the schema after the next startup.

The problems

  1. Delete didn't revoke access. permission.Delete only removed the Postgres row and left the SpiceDB grant tuples behind, so a check kept returning true — the deleted permission still granted access.
  2. Roles kept listing the deleted permission. Nothing removed the slug from the roles.permissions list (no FK — it's a denormalized jsonb array).
  3. Delete wasn't even callable. The DeletePermission admin RPC was hardwired to "function not available."
  4. No protection for built-in permissions. Base-schema/config permissions are recreated every startup, so deleting one only revokes access until the next restart, then it silently returns.
  5. Deleting a namespace's last permission could break startup. The namespace's definition is generated from its permissions; removing the last one makes it vanish from the next startup's schema, and SpiceDB refuses to drop a type that still has relationships → startup fails (cannot delete object definition … as a relationship exists under it).

The fix

  • Revoke on delete: permission.Delete removes every app/role:*#<slug> grant tuple (all roles, all principal types) before deleting the row.
  • Prune role definitions: strips the slug from every role's list via role.Service.RemovePermissionFromRoles — through the role service (not its repo), a single targeted DB update, deliberately not role.Update (which would rebuild each role's tuples + emit a per-role audit). Wired via a setter-injected interface (the SetMembershipService pattern).
  • Make delete usable: implemented the DeletePermission admin handler, gated on superuser.
  • Protect built-in permissions: reject deleting one defined by the base schema or config (FailedPrecondition).
  • Protect against orphaning a namespace: if the permission is the last one in its namespace, the handler asks SpiceDB directly (ReadRelationships by object type — the source of truth, since the Postgres mirror can drift) whether any relationship of that type still exists. If so → FailedPrecondition (remove those resources/relationships first), so a delete can't wedge the next startup. If the namespace has other permissions, it survives and the delete proceeds.

Also fixes RelationRepository.ListRelations to omit the subject filter when no subject is given — SpiceDB rejects an empty SubjectType, so a by-object-type read was previously impossible.

Known limitation (follow-up)

  • Once a namespace is emptied (no relationships, last permission deleted), its now-unused definition is dropped on the next bootstrap recompile — expected and safe (nothing left under it).

Tests

TestPermissionDeleteCascade (e2e): stray permission + role → delete → no app/role:*#<slug> tuple remains and the role no longer lists it; deleting a built-in permission (app/organization:get) → FailedPrecondition.

TestPermissionDeleteBlockedByResource (e2e): permission in a fresh namespace + a resource of that type → delete → FailedPrecondition; remove the resource → permission deletes cleanly.

Plus unit tests for the relation sweep + role prune (core/permission) and the role-list strip (core/role).

Addresses gap (6) of #1661.

@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
frontier Ready Ready Preview, Comment Jun 10, 2026 9:34am

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Add DeletePermission API with safeguards preventing removal of built-in permissions.
    • Cascade cleanup: deleting a permission removes role→permission references and prunes role listings.
    • Namespace integrity checks block deletion when namespace-scoped resources or relations remain.
    • Authorization relaxed for the DeletePermission endpoint to allow superuser access.
  • Tests

    • Added regression tests covering cascade deletion and blocked deletion while resources exist.

Walkthrough

This PR makes permission deletion cascade: API and service layers now sweep SpiceDB role→permission relations, prune role permission lists, expose a DeletePermission handler with builtin and namespace-safety guards, update server wiring/authorization, adjust store/SpiceDB behavior, and add unit and E2E tests and mocks.

Changes

Permission Cascade Deletion

Layer / File(s) Summary
Server wiring and auth
cmd/serve.go, pkg/server/connect_interceptors/authorization.go
Construct relation repo/service before permission service; pass relationService into permission.NewService; inject roleService via permissionService.SetRoleService; allow superusers to call AdminService/DeletePermission.
Permission service and tests
core/permission/service.go, core/permission/service_test.go, core/permission/mocks/*
permission.Service now depends on RelationService (constructor accepts logger+relationService) and uses RoleService (setter). Service.Delete: Get -> ensure slug -> relationService.Delete (ignore ErrNotExist) -> roleService.RemovePermissionFromRoles -> repo.Delete. Unit tests updated to use relation mock and exercise success/error paths.
Role repository/service and mocks
core/role/role.go, core/role/service.go, internal/store/postgres/role_repository.go, core/role/mocks/repository.go, core/permission/mocks/role_service.go, core/role/service_test.go
Adds Repository.RemovePermissionFromRoles and role.Service.RemovePermissionFromRoles delegating to repo; Postgres repo implements JSONB update to remove permission slug; mocks and tests added.
API surface, handler, and builtin permissions
internal/api/v1beta1connect/interfaces.go, internal/api/v1beta1connect/permission.go, internal/bootstrap/service.go, internal/api/v1beta1connect/mocks/*
PermissionService gains Delete(ctx,id). DeletePermission handler preloads permission, rejects built-ins via bootstrapService.BuiltinPermissions, checks namespace safety via permissionService.List + relationService.ListRelations, maps errors to NotFound/FailedPrecondition/Internal, and calls permissionService.Delete. bootstrap.Service.BuiltinPermissions computes built-in permission slugs.
API relation mock helpers
internal/api/v1beta1connect/mocks/relation_service.go
Adds ListRelations mock to allow relation existence checks in unit/integration tests.
Store and SpiceDB adjustments
internal/store/postgres/role_repository.go, internal/store/spicedb/relation_repository.go
Postgres role repo implements bulk removal of permission slug from roles' JSON permissions; SpiceDB relation listing avoids sending empty subject filters.
End-to-end regression tests
test/e2e/regression/service_registration_test.go
Adds tests validating that deleting a custom permission removes role→permission relations, built-in permissions cannot be deleted, and deletion is blocked while namespace-scoped resources exist.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Suggested reviewers

  • rohilsurana
  • AmanGIT07
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
core/permission/service_test.go (2)

219-220: ⚡ Quick win

Assert the exact cascade delete relation payload.

Line 219 and Line 252 use mock.Anything for the relation argument, so tests won’t catch a broken namespace/relation-name contract.

Proposed test assertion tightening
+ // add imports:
+ // "github.com/raystack/frontier/core/relation"
+ // "github.com/raystack/frontier/internal/bootstrap/schema"

+ expectedRel := relation.Relation{
+   Object: relation.Object{
+     Namespace: schema.RoleNamespace,
+   },
+   RelationName: perm.Slug,
+ }

- mockRelation.On("Delete", mock.Anything, mock.Anything).Return(nil).Once()
+ mockRelation.On("Delete", mock.Anything, expectedRel).Return(nil).Once()

- mockRelation.On("Delete", mock.Anything, mock.Anything).Return(expectedErr).Once()
+ mockRelation.On("Delete", mock.Anything, expectedRel).Return(expectedErr).Once()

Also applies to: 252-253


242-258: ⚡ Quick win

Add a delete test for ignored relation.ErrNotExist.

Service.Delete intentionally tolerates relation.ErrNotExist; add a subtest to lock that behavior and ensure repo delete still executes.

Suggested subtest
+ t.Run("should continue deleting permission when relation sweep returns ErrNotExist", func(t *testing.T) {
+   mockRepo := mocks.NewRepository(t)
+   mockRelation := mocks.NewRelationService(t)
+   svc := permission.NewService(mockRepo, mockRelation)
+
+   permissionID := uuid.New().String()
+   perm := permission.Permission{ID: permissionID, Name: "delete", Slug: "app_organization_delete"}
+
+   mockRepo.On("Get", mock.Anything, permissionID).Return(perm, nil).Once()
+   mockRelation.On("Delete", mock.Anything, mock.Anything).Return(relation.ErrNotExist).Once()
+   mockRepo.On("Delete", mock.Anything, permissionID).Return(nil).Once()
+
+   err := svc.Delete(context.Background(), permissionID)
+   assert.NoError(t, err)
+ })
test/e2e/regression/service_registration_test.go (1)

170-175: ⚡ Quick win

Derive permSlug via schema helper instead of hardcoding.

Line 170 hardcodes slug formatting; generating it from namespace/name keeps this test stable if slug rules change.

Proposed tweak
- const permSlug = "permcascade_res_act"
+ const permNamespace = "permcascade/res"
+ const permName = "act"
+ permSlug := schema.FQPermissionNameFromNamespace(permNamespace, permName)

  createPermResp, err := s.testBench.AdminClient.CreatePermission(ctx, connect.NewRequest(&frontierv1beta1.CreatePermissionRequest{
    Bodies: []*frontierv1beta1.PermissionRequestBody{
-     {Name: "act", Namespace: "permcascade/res"},
+     {Name: permName, Namespace: permNamespace},
    },
  }))

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2dca1fc8-e791-43e9-a9f1-c6f82b8eba39

📥 Commits

Reviewing files that changed from the base of the PR and between 00f567b and 27696db.

📒 Files selected for processing (9)
  • cmd/serve.go
  • core/permission/mocks/relation_service.go
  • core/permission/service.go
  • core/permission/service_test.go
  • internal/api/v1beta1connect/interfaces.go
  • internal/api/v1beta1connect/mocks/permission_service.go
  • internal/api/v1beta1connect/permission.go
  • pkg/server/connect_interceptors/authorization.go
  • test/e2e/regression/service_registration_test.go

@coveralls

coveralls commented Jun 8, 2026

Copy link
Copy Markdown

Coverage Report for CI Build 27267132511

Coverage decreased (-0.09%) to 43.281%

Details

  • Coverage decreased (-0.09%) from the base build.
  • Patch coverage: 124 uncovered changes across 7 files (28 of 152 lines covered, 18.42%).
  • No coverage regressions found.

Uncovered Changes

File Changed Covered %
internal/api/v1beta1connect/permission.go 56 0 0.0%
internal/bootstrap/service.go 20 0 0.0%
internal/store/postgres/role_repository.go 16 0 0.0%
internal/store/spicedb/relation_repository.go 15 0 0.0%
core/permission/service.go 35 25 71.43%
cmd/serve.go 6 0 0.0%
pkg/server/connect_interceptors/authorization.go 1 0 0.0%
Total (8 files) 152 28 18.42%

Coverage Regressions

No coverage regressions found.


Coverage Stats

Coverage Status
Relevant Lines: 37146
Covered Lines: 16077
Line Coverage: 43.28%
Coverage Strength: 12.3 hits per line

💛 - Coveralls

@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 27696db to 8c37eb5 Compare June 8, 2026 05:35
@whoAbhishekSah whoAbhishekSah marked this pull request as draft June 8, 2026 06:03
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 8c37eb5 to 0d86850 Compare June 8, 2026 07:55
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 0d86850 to 632e276 Compare June 9, 2026 06:27
@whoAbhishekSah whoAbhishekSah changed the title fix(permission): cascade role→permission tuple cleanup on delete fix(permission): clean up leftover access in SpiceDB when a permission is deleted Jun 9, 2026
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 632e276 to 391819b Compare June 9, 2026 08:42
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 391819b to 6cf5c46 Compare June 9, 2026 09:29
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 6cf5c46 to 57f928c Compare June 9, 2026 09:49
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 57f928c to 518a316 Compare June 10, 2026 07:42
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 518a316 to dc51fa9 Compare June 10, 2026 07:54
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from dc51fa9 to 4b670c2 Compare June 10, 2026 08:14
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from 4b670c2 to ec737b2 Compare June 10, 2026 08:18
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from ec737b2 to afb3748 Compare June 10, 2026 08:35
@whoAbhishekSah whoAbhishekSah force-pushed the fix/permission-delete-cascade branch from afb3748 to d93329c Compare June 10, 2026 08:36
@whoAbhishekSah

whoAbhishekSah commented Jun 10, 2026

Copy link
Copy Markdown
Member Author

Manual verification — comprehensive ✅

Tested every path of the updated DeletePermission against a local build of this branch (live ConnectRPC + SpiceDB), as super admin, including a real server restart for boot-safety. The app.resources_config_path config defines kitchen/recipe:*, compute/machine:scrub, garage/car:wash (and a garden/plant:* added for the restart test), giving permissions from all three sources: base zed schema, config files, and the CreatePermission API.

The current delete logic has three layers, all exercised:

  1. Built-in guard — reject if the permission comes from the base schema or config (bootstrap would recreate it).
  2. Last-permission-of-namespace guard — if this is the namespace's last permission and SpiceDB still has relationships of that type (e.g. resources), reject; otherwise the namespace would vanish from the schema on next boot and SpiceDB would refuse to drop a type that still has relationships → broken startup.
  3. Service delete — sweep the role→permission grant tuples, prune the slug from every role's permission list, then delete the row.

Round 1 — behavior (no restart)

Case Setup Expected Result
Bad / malformed id delete random or non-uuid id NotFound
Built-in: base schema delete app/organization:get FailedPrecondition (built-in)
Built-in: config delete kitchen/recipe:read FailedPrecondition (built-in)
Last perm + resource exists custom lab/sample:scan + a resource of that type FailedPrecondition (namespace); nothing torn down
Last perm, no relations custom lab/probe:read, no resource clean delete
Not-last perm + resource fleet/drone:{fly,land} + resource; delete fly allowed (other perm keeps namespace alive)
Now-last perm + resource then delete land (resource still there) FailedPrecondition (namespace)
Last perm after relations cleared remove the relations, then delete land allowed
Role pruning perm granted by 2 roles; delete it slug removed from both roles' lists (other perms kept), grant tuples 3→0
Guard precedence delete compute/machine:scrub (config built-in and last-perm+resource) rejected with the built-in message, not the namespace one
Authz delete as a regular (non-super) user PermissionDenied; unauthenticated → Unauthenticated
Idempotency delete a stray permission twice 1st ok, 2nd NotFound

Round 2 — boot safety + fresh config built-in (with a real restart)

Case What Result
Boot safety Restart after Round 1's allowed clean deletes (3 custom namespaces whose last permission was removed, 0 relations each) Server boots cleanly; those namespaces stay gone; a boot-safe leftover (perm+resource kept) is preserved
Fresh config built-in Add a new garden_plant.yml, restart so `garden/plant:water prune` bootstrap as rows, then delete them

Round 3 — namespace guard reads SpiceDB, not the resources table

The guard intentionally queries SpiceDB (source of truth) rather than the Postgres resources table. Verified with a relationship that has no backing resource row:

State Delete last perm of the namespace
A raw SpiceDB tuple of that type written directly (<ns>:x#owner@app/user:…), resources table empty rejected (FailedPrecondition) ✅
That tuple removed succeeds ✅

This is why a resources-table-only check would be insufficient — it would miss a loose/orphaned tuple and let the delete through, re-creating the boot break.

The boot-safety result is the key one: the deletes the guard allowed leave the server bootable, and the ones it rejected are exactly those that would otherwise have broken startup.


Side-findings (all pre-existing — NOT introduced by this PR; it doesn't touch this code)

Tracked separately in #1693.

  1. DeleteProjectResource returns 500 for a custom namespace with no delete verb. It does an authz precheck for the delete permission on the resource type; a namespace created via CreatePermission only has the verbs you defined, so the check fails (relation/permission delete not found under definition ...) → 500. This creates a chicken-and-egg with the new guard: it tells you to "remove the resources first," but you can't remove such a resource through the API. Worth deciding whether the guard's message should point at this, or whether resource deletion should tolerate a missing delete verb — but it's arguably out of scope here.
  2. CreatePermission silently no-ops in core app/* namespaces — creating e.g. app.organization.zzz returns 200 with an empty body and creates nothing (those namespaces are filtered). No stray verbs can exist there.
  3. CreatePermission returns 500 instead of 400 when the service or verb component is < 3 characters — they become SpiceDB relation/definition identifiers, which must match ^[a-z][a-z0-9_]{1,62}[a-z0-9]$ (min 3 chars). e.g. keys with verb go/it or service r2 fail at schema-compile time. Should be rejected as a clean validation error at the API.

permission.Delete only removed the DB row. Every role granting the
permission keeps app/role:<role>#<slug>@<*> tuples (one per principal
type), so deleting a permission left those tuples dangling on a relation no
longer backed by any permission row. The method was also unreachable: the
DeletePermission RPC was hardwired to "function not available".

Changes:
- permission.Service gains a relation dependency and, on Delete, sweeps the
  role->permission tuples by object-namespace (app/role) + relation-name
  (the permission slug), clearing them across all roles and principal types
  before removing the row. Tolerates ErrNotExist for unused permissions.
- Implement the DeletePermission admin handler and gate it on superuser
  (previously returned CodeUnavailable), making the guard reachable and
  giving the data-cleanup effort a way to remove stray permissions.

Adds an e2e regression test: create a permission, build a role on it,
delete the permission, assert no role->permission tuple remains. Verified
it fails without the sweep (tuple lingers) and passes with it.

Refs #1661

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
core/permission/service.go (1)

71-100: 💤 Low value

Minor: Comment wording could be clearer.

Line 73 says "one per principal type" but the code deletes all tuples in a single relationService.Delete call. Consider rewording to "the SpiceDB tuples that let roles grant it (app/role:*#<slug>@<principal>:*, across all role IDs and principal types)" for clarity.

internal/api/v1beta1connect/permission.go (1)

153-229: ⚡ Quick win

Observability gap: intermediate service errors not logged.

The handler correctly implements the three-layer guard (built-in check, namespace-safety check, then delete), but lines 174, 192, and 208 skip errorLogger.LogServiceError before returning Internal errors, unlike the pattern at lines 164 and 218. Intermediate service failures (BuiltinPermissions, List, ListRelations) should also be logged to aid debugging.

📊 Proposed fix: add errorLogger calls for consistency
 builtin, err := h.bootstrapService.BuiltinPermissions(ctx)
 if err != nil {
+	errorLogger.LogServiceError(ctx, request, "DeletePermission.BuiltinPermissions", err, "permission_id", permissionID)
 	return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("DeletePermission.BuiltinPermissions: permission_id=%s: %w", permissionID, err))
 }

Apply the same pattern at lines 192 (List) and 208 (ListRelations).


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1a916416-411d-4a2d-8798-7016eacbf115

📥 Commits

Reviewing files that changed from the base of the PR and between 27696db and 3b06e31.

📒 Files selected for processing (19)
  • cmd/serve.go
  • core/permission/mocks/relation_service.go
  • core/permission/mocks/role_service.go
  • core/permission/service.go
  • core/permission/service_test.go
  • core/role/mocks/repository.go
  • core/role/role.go
  • core/role/service.go
  • core/role/service_test.go
  • internal/api/v1beta1connect/interfaces.go
  • internal/api/v1beta1connect/mocks/bootstrap_service.go
  • internal/api/v1beta1connect/mocks/permission_service.go
  • internal/api/v1beta1connect/mocks/relation_service.go
  • internal/api/v1beta1connect/permission.go
  • internal/bootstrap/service.go
  • internal/store/postgres/role_repository.go
  • internal/store/spicedb/relation_repository.go
  • pkg/server/connect_interceptors/authorization.go
  • test/e2e/regression/service_registration_test.go
✅ Files skipped from review due to trivial changes (5)
  • internal/api/v1beta1connect/mocks/bootstrap_service.go
  • internal/api/v1beta1connect/mocks/relation_service.go
  • core/permission/mocks/role_service.go
  • core/permission/mocks/relation_service.go
  • internal/api/v1beta1connect/mocks/permission_service.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • test/e2e/regression/service_registration_test.go
  • pkg/server/connect_interceptors/authorization.go
  • core/permission/service_test.go

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e/regression/service_registration_test.go (1)

257-308: 🏗️ Heavy lift

Add a raw-SpiceDB-tuple variant for the namespace guard.

This test only proves the guard blocks deletion when a CreateProjectResource row exists. If DeletePermission ever regressed to checking Postgres-backed resources instead of relationService.ListRelations, this would still pass because the resource row is present. A second case with a direct orphanguard/widget relation and no backing resource row would pin the “SpiceDB is the source of truth” behavior this PR is relying on.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8f27328c-c72c-4d0d-b09a-1b612f0f25fd

📥 Commits

Reviewing files that changed from the base of the PR and between 3b06e31 and dfc1924.

📒 Files selected for processing (19)
  • cmd/serve.go
  • core/permission/mocks/relation_service.go
  • core/permission/mocks/role_service.go
  • core/permission/service.go
  • core/permission/service_test.go
  • core/role/mocks/repository.go
  • core/role/role.go
  • core/role/service.go
  • core/role/service_test.go
  • internal/api/v1beta1connect/interfaces.go
  • internal/api/v1beta1connect/mocks/bootstrap_service.go
  • internal/api/v1beta1connect/mocks/permission_service.go
  • internal/api/v1beta1connect/mocks/relation_service.go
  • internal/api/v1beta1connect/permission.go
  • internal/bootstrap/service.go
  • internal/store/postgres/role_repository.go
  • internal/store/spicedb/relation_repository.go
  • pkg/server/connect_interceptors/authorization.go
  • test/e2e/regression/service_registration_test.go
✅ Files skipped from review due to trivial changes (4)
  • internal/api/v1beta1connect/mocks/bootstrap_service.go
  • core/permission/mocks/relation_service.go
  • internal/api/v1beta1connect/mocks/relation_service.go
  • core/permission/mocks/role_service.go
🚧 Files skipped from review as they are similar to previous changes (13)
  • core/role/role.go
  • internal/bootstrap/service.go
  • core/role/service.go
  • pkg/server/connect_interceptors/authorization.go
  • internal/api/v1beta1connect/mocks/permission_service.go
  • internal/api/v1beta1connect/interfaces.go
  • core/role/service_test.go
  • core/permission/service.go
  • internal/store/spicedb/relation_repository.go
  • core/permission/service_test.go
  • cmd/serve.go
  • internal/store/postgres/role_repository.go
  • core/role/mocks/repository.go

Comment on lines +185 to +216
// A namespace exists in SpiceDB only as long as it has a permission. If this
// is its last permission, the namespace disappears on the next boot — and
// SpiceDB won't drop a type that still has relationships, which breaks
// startup. So if nothing else keeps the namespace alive, refuse the delete
// while any relationship of that type still exists.
otherPerms, err := h.permissionService.List(ctx, permission.Filter{Namespace: perm.NamespaceID})
if err != nil {
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("DeletePermission.List: permission_id=%s: %w", permissionID, err))
}
namespaceSurvives := false
for _, p := range otherPerms {
if p.ID != perm.ID {
namespaceSurvives = true
break
}
}
if !namespaceSurvives {
// ask SpiceDB itself (the source of truth) whether anything of this type
// still exists — resources, policy grants, or directly-created tuples.
rels, err := h.relationService.ListRelations(ctx, relation.Relation{
Object: relation.Object{Namespace: perm.NamespaceID},
})
if err != nil {
return nil, connect.NewError(connect.CodeInternal, fmt.Errorf("DeletePermission.ListRelations: permission_id=%s: %w", permissionID, err))
}
if len(rels) > 0 {
return nil, connect.NewError(connect.CodeFailedPrecondition,
errors.New("cannot delete the last permission of namespace "+perm.NamespaceID+": relationships of this type still exist in SpiceDB (e.g. resources); remove them first"))
}
}

err = h.permissionService.Delete(ctx, permissionID)

@coderabbitai coderabbitai Bot Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Serialize the last-permission guard with namespace writers.

This check is still TOCTOU: after List/ListRelations says the namespace is empty, another request can create a resource or direct tuple in perm.NamespaceID before permissionService.Delete removes the final permission row. The current SpiceDB schema keeps accepting that type until restart, so the delete can still leave relationships behind and reintroduce the boot-failure this guard is trying to prevent. This needs a shared critical section with writers for that namespace (for example an advisory lock or an equivalent recheck under the same lock) before the last permission is deleted.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid — this is a real TOCTOU: a resource/tuple could be created in the namespace between the emptiness check and the row delete, re-orphaning it. A couple of reasons I'm leaving it as best-effort here rather than adding the lock:

  • Low probability, recoverable worst case. The window is the few microseconds between the check and the delete; it needs a superuser deleting a namespace's last permission at the exact moment someone creates a resource of that same namespace. The worst case is a startup failure (recoverable by re-adding the permission or clearing the relationships), not data loss or an auth bypass.
  • A correct fix is cross-cutting and arguably belongs a layer down. Serializing this would mean every namespace writer (resource create, CreateRelation, policy-on-resource) taking the same advisory lock keyed by namespace — significant surface area for this corner. And the deeper root cause isn't the missing lock; it's that bootstrap hard-fails when it tries to drop a populated object type. The robust fix is making bootstrap tolerant (don't drop a type that still has relationships, or clear them first), which is schema-management work tracked separately.

So this guard intentionally catches the common case (delete while a resource clearly exists) and is not race-free. Tracking the bootstrap-tolerance / serialization follow-up in #1693.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whoAbhishekSah Thanks for the thorough explanation. The reasoning is sound — the lock surface would be wide (every namespace writer would need to participate), the worst case is a recoverable startup failure rather than a data integrity issue, and making bootstrap tolerant of populated object types is the correct long-term fix anyway. The best-effort guard as implemented is a meaningful improvement over the previous state. Happy to defer the serialization/bootstrap-tolerance work to #1693.


🧠 Learnings used
Learnt from: rohilsurana
Repo: raystack/frontier PR: 1552
File: billing/customer/service.go:52-55
Timestamp: 2026-04-22T05:54:17.663Z
Learning: In this repo’s Go services, avoid adding nil-safe logger guards inside service constructors (e.g., `if logger == nil { logger = slog.Default() }`). Instead, ensure all call sites explicitly pass a non-nil `*slog.Logger` (typically `slog.Default()` or a properly initialized logger). Reviewers should reject suggestions to add constructor nil guards for internal service constructors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants