feat(Segment membership inspection PoC): Daily Snowflake-backed per-env segment counts#7464
feat(Segment membership inspection PoC): Daily Snowflake-backed per-env segment counts#7464khvn26 wants to merge 10 commits into
Conversation
Backfills identities from Dynamo to Snowflake daily, then refreshes
per-(segment, environment) match counts in the new `SegmentMembership`
cache. The translator from `flagsmith-sql-flag-engine` turns each
canonical segment into a SQL `WHERE` predicate; counts are
materialised as `COUNT(*) ... GROUP BY environment_id` per segment.
The serializer surfaces them as a list of `{environment, count,
last_synced_at}`, ready to back per-env count badges in the
Identities-tab environment dropdown.
Pipeline shape:
- `backfill_identities_to_snowflake` is the daily recurring task
(`timeout=4h` to fit large environments). After backfilling each
project's environments it dispatches one
`refresh_project_segment_counts(project_id)` per project so the
count refresh always sees the freshly backfilled snapshot rather
than racing a separate schedule.
- `refresh_project_segment_counts` opens its own Snowpark session,
re-checks the FoF flag at execution time so a stale fan-out skips
orgs that have since been disabled, and bulk-upserts via Postgres
`ON CONFLICT` (single statement per project).
- `compute_segment_counts_for_project` returns a list of unsaved
`SegmentMembership` instances; the task stamps `last_synced_at`
consistently across the batch. Untranslatable segments emit a
structlog `compute.segment.skipped` error event so we hear about
predicate gaps rather than silently dropping rows.
Both tasks short-circuit when SNOWFLAKE_* env vars are unset and
skip per-organisation when the `segment_membership_inspection`
Flagsmith-on-Flagsmith flag is False, so SaaS rolls out gradually
and self-hosted is unaffected.
DELETE-then-INSERT runs without an explicit transaction. Snowflake
holds micropartition locks for the lifetime of an open transaction,
and at 10M+ identities a BEGIN/COMMIT around the whole env partition
would keep that lock open for minutes. Per-statement implicit
commits leave a brief mid-refresh window where readers see an empty
partition; acceptable under the FoF flag's gradual rollout.
Backfill writes via Snowpark DataFrames against the canonical
IDENTITIES schema, with `DynamoIdentity` documents projected through
`segment_membership.mappers.map_identity_document_to_snowflake_row`.
Refresh issues a single batched UNION ALL using parameterised SQL —
env keys are bound, predicates from the engine are already escape-
safe. Schema setup is a `RunPython` migration gated on
`is_snowflake_configured()`, so it no-ops on self-hosted and in the
test suite.
The segment serializer surfaces cached counts via a new `memberships`
list field; absence of an entry is the read-side signal, no flag
check on the read path. `SegmentMembershipSerializer` gives
drf-spectacular a typed schema. Adds a generic `batched` helper to
`api/util/util.py` for the per-INSERT batching.
beep boop
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
The latest updates on your projects. Learn more about Vercel for GitHub. 3 Skipped Deployments
|
Docker builds report
|
…ps prefetch
The new `prefetch_related("memberships")` adds one IN-clause query per
list response, even when no rows exist. Update the regression
expectations so the existing test suite reflects the new baseline.
beep boop
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7464 +/- ##
==========================================
- Coverage 98.44% 98.39% -0.06%
==========================================
Files 1398 1411 +13
Lines 52654 53111 +457
==========================================
+ Hits 51834 52256 +422
- Misses 820 855 +35 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
… pre-release Switches the api dep from a private-repo git URL — which the Docker build can't clone in CI — to a versioned pin against Flagsmith's staging CodeArtifact PyPI (`flagsmith-pypi-staging`, account 302456015006, eu-west-2). Initial published release: 0.1.0a1. The reusable docker-build workflow now unconditionally assumes the OIDC role `arn:aws:iam::302456015006:role/codeartifact-github-actions-staging` (trust policy allows any `repo:Flagsmith/*`), fetches an authorisation token, and exposes it to every build as the `codeartifact_token` BuildKit secret. Builds that don't mount the secret simply ignore it; the OIDC + token cost is a couple of seconds per build. `Dockerfile`'s four `make install*` lines mount the `codeartifact_token` secret and export `POETRY_HTTP_BASIC_FLAGSMITH_PYPI_STAGING_*` so poetry resolves the dep from CodeArtifact. The header documents the `--secret="id=codeartifact_token,env=..."` incantation for local builds. beep boop
This comment was marked as low quality.
This comment was marked as low quality.
Visual Regression16 screenshots compared. See report for details. |
5ff82e9 to
4d954a2
Compare
The unlabeled metrics added in this PR trip a bug in flagsmith-common's `assert_metric` test plugin (`MetricWrapperBase.clear()` raises AttributeError on unlabeled metrics). Fix is in flagsmith-common#224; pin to the branch until that lands and a new release is cut. beep boop
Wires the segment-membership pipeline against DynamoDB Local + a real Snowflake account: seeds a project, environment, and segment in core Postgres; creates the EdgeIdentities table; seeds 25 matching + 25 non-matching identities; runs backfill + refresh tasks; asserts SegmentMembership.count equals the matching seed. Run with `make docker-up django-migrate` followed by `make smoke-test-segment-membership`. SNOWFLAKE_* env vars come from .env via Make's existing dotenv include; cleans the env's Snowflake rows on exit. beep boop
This reverts commit 9d31cb3.
Playwright Test Results (oss - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Failed testsfirefox › tests/environment-permission-test.pw.ts › Environment Permission Tests › Environment-level permissions control access to features, identities, and segments @enterprise Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-16)Details
Playwright Test Results (oss - depot-ubuntu-latest-arm-16)Details
Playwright Test Results (private-cloud - depot-ubuntu-latest-arm-16)Details
|
Mappers: drop private-helper tests, replace with parametrised cases exercising `map_identity_document_to_snowflake_row` directly; trust TypedDict-required fields rather than caring for absent ones. Migration: assert the full DDL fed into `sess.sql(...)` and `spec` the Snowpark mock against the real Session class. beep boop
Thanks for submitting a PR! Please check the boxes below:
docs/if required so people know about the feature.Changes
Contributes to #5663.
Adds a daily pipeline that backfills Dynamo identities into Snowflake, materialises per-(segment, environment) match counts via
flagsmith-sql-flag-engine, and exposes them on the segment endpoint asmemberships: [{environment, count, last_synced_at}]for env-dropdown badges. Gated behind the org-scopedsegment_membership_inspectionFoF flag; no-ops whenSNOWFLAKE_*env vars are unset.Review complexity: 4/5
Review order recommendation:
models.py(cache table) →services.py(compile + count, parameterised SQL) →tasks.py(daily recurring backfill fans out per-project refresh) →mappers.py(Dynamo doc → IDENTITIES row) →migrations/0002_*(Snowflake DDLRunPython, no-op when unconfigured) →segments/serializers.py+views.py(read-sidemembershipsfield, prefetched).How did you test this code?
Beyond the existing unit + integration tests:
Ran an end-to-end smoke test against DynamoDB Local + a real Snowflake account configured via
.env. The flow:make docker-up django-migrate— migrations created theSegmentMembershiptable in core Postgres and theIDENTITIESschema in Snowflake.plan EQUAL "growth".traits.plan = "growth", 25 withtraits.plan = "basic".backfill_identities_to_snowflake()directly — confirmed all 50 rows landed in Snowflake'sIDENTITIEStable withQUERY_TAGcorrectly attributing the spend per (org, project).refresh_project_segment_counts(project_id)— confirmed exactly oneSegmentMembershiprow materialised withcount = 25and a freshlast_synced_at.Extensive testing will be done on staging.