perf(tags): drop inherited_tags TagField, use _inherited_tag_names JSONField (Phase B Stage 3)#14833
Closed
valentijnscholten wants to merge 12 commits intoDefectDojo:devfrom
Closed
Conversation
Adds unittests/test_tag_inheritance_perf.py with assertNumQueries baselines on the six hottest tag inheritance paths (Product tag add/remove propagating to N findings, child create under inheritance, sticky enforcement on child tag edits). Numbers are pinned against current `dev` behavior so subsequent optimization work shows up as concrete query-count reductions instead of relying on manual benchmarking. The class is intentionally temporary: pins move down as the redesign work lands and the file can be deleted once the targets are met.
Extends the perf test class with two more pinned hot paths so all child models exercised by `propagate_tags_on_product_sync` are covered: product_tag_add -> 100 endpoints (V2) : 3958 product_tag_remove -> 100 endpoints (V2): 3740 product_tag_add -> 100 locations (V3) : 4532 product_tag_remove -> 100 locations (V3): 4307 Both V2 and V3 paths run regardless of the ambient `V3_FEATURE_LOCATIONS` setting via per-test `@override_settings(...)`. CI matrix runs the suite in both modes, so dynamic pin selection (`_pin(v2=..., v3=...)`) handles the small per-mode count differences on the existing finding tests.
…enario
Two additions:
1. New TagInheritanceImportPerfBaselines class pins query counts for the
importer hot path (production's heaviest tag-inheritance scenario).
Both first-import and no-change-reimport are covered, each with V2
and V3 method variants:
zap_import_v2 : 1461
zap_import_v3 : 1319
zap_reimport_no_change_v2 : 77
zap_reimport_no_change_v3 : 95
2. Restructures the existing baseline class so every scenario has both
a _v2 and _v3 method variant via per-test @override_settings. The
whole suite now runs both modes in a single invocation; no need to
run twice with different DD_V3_FEATURE_LOCATIONS env.
Phase A leaves the importer numbers ~unchanged (importer hot loop is
creation-driven, not the bulk-propagation path Phase A targets). Phase
B's tag_inheritance.batch() context manager is the lever that lowers
these numbers.
First V3 Location op in the class paid a one-time ContentType lookup, producing a matrix-dependent off-by-one (V3-default-on CI: 4531; V3-default-off CI + local: 4532). Match the warm-up pattern used in test_importers_performance and pin EXPECTED_PRODUCT_TAG_ADD_100_LOCATIONS to the post-warm value (4531).
…reate Replaces the per-row `.save()` loop in `propagate_tags_on_product_sync` with bulk SQL through the existing tag-utils helpers. For every child model (Engagement/Test/Finding/Endpoint/Location), reads current inherited_tags in one query, computes the per-child diff against the Product's tags, and applies adds/removes via `bulk_add_tag_mapping` and the new `bulk_remove_tags_from_instances` helper. Both `tags` and `inherited_tags` fields are kept in sync. Also gates the per-child `inherit_tags_on_instance` post_save handler on `created=True`. The previous behavior fired on every save (create OR update), repeatedly re-applying inherited tags to children whose tag state had not changed. Sticky enforcement on user-driven tag edits is unchanged (still handled by `make_inherited_tags_sticky` on m2m_changed). Pinned query-count baselines from PR DefectDojo#14811 drop accordingly: product_tag_add -> 100 findings : 4758 -> 91 (~52x fewer queries) product_tag_remove -> 100 findings : 4540 -> 53 (~85x fewer queries) Sticky and child-creation paths are unchanged in this PR. Phase B targets those (centralized inheritance module + drop the duplicate `inherited_tags` TagField).
…l batch context
Adds dojo/tag_inheritance.py with a thread-local batch() context manager
and is_in_batch() predicate. While the calling thread is inside a batch,
make_inherited_tags_sticky (m2m_changed) early-returns; the calling code
takes responsibility for applying inheritance in bulk.
Replaces the previous pattern in dojo/importers/location_manager.py:556:
disconnected = signals.m2m_changed.disconnect(
make_inherited_tags_sticky, sender=Location.tags.through,
)
try:
...
finally:
if disconnected:
signals.m2m_changed.connect(...)
Signal.disconnect mutates Django's process-global receiver list and is
not thread-safe. While disconnected, every thread/greenlet in the same
process loses sticky enforcement. Safe only under Celery --pool=prefork
and single-threaded gunicorn; broken under --pool=threads|gevent|eventlet,
gunicorn --threads >1, or ASGI threadpools. Also fragile on hard process
exit mid-import (handler stays disconnected for the rest of the process
lifetime).
The new batch context lives in threading.local(): each thread has its
own depth counter, the signal handler stays globally connected, and the
suppression decision is per-thread. No global mutation, no reconnect
hazard.
This is Phase B Stage 1. Subsequent stages will wrap the broader importer
orchestration in batch(), replace the duplicate inherited_tags TagField
with a JSON column, and drop _manage_inherited_tags / per-model
inherit_tags().
Pinned perf-test note: V3 zap_scan_import baseline rises 1243 -> 1263
(~1.6%). The previous process-global disconnect was narrower in scope
(Location.tags.through only); the batch context covers all child-tag
through-tables. Net trade is positive given the threading bug fix; full
Phase B reductions arrive in later stages.
…er exit Stage 2 of Phase B. Wraps the import / reimport orchestration in `process_scan` (default_importer + default_reimporter) inside `tag_inheritance.batch()`. Inside the batch: - `make_inherited_tags_sticky` (m2m_changed) early-returns - `inherit_tags_on_instance` (post_save, gated on created=True) early-returns - `inherit_tags_on_linked_instance` (post_save) early-returns After the batch exits, `tag_inheritance.flush_for_product(product)` runs once and bulk-applies inheritance to every child via the Phase A bulk diff path. The bulk diff in `_sync_inheritance_for_qs` (dojo/product/helpers.py) is extended with a re-merge step: it reads each child's current `tags` through-table and ensures every target inherited name is present. This is the bulk equivalent of `make_inherited_tags_sticky` and is needed because `tags.set([...])` inside the batch (e.g. `update_test_tags` during reimport) can wipe inherited names from `tags` while `inherited_tags` stays in sync. A current-tags read is added to gate the re-merge so it only writes rows that are actually missing. `tag_inheritance.flush_for_product(product)` is added as the public flush helper. Internally delegates to `propagate_tags_on_product_sync`. Pinned perf-test impact (this branch vs Stage 1): ZAP scan import V2 : 1385 -> 1006 (~27% drop) ZAP scan import V3 : 1263 -> 984 (~22% drop) ZAP reimport V2 : 69 -> 82 (+13: flush has fixed cost) ZAP reimport V3 : 87 -> 140 (+53: flush has fixed cost) product_tag_add -> 100 findings V2/V3 : 91 -> 94 (+3: tags read for re-merge) product_tag_remove -> 100 findings V2/V3 : 53 -> 56 (+3) product_tag_add -> 100 endpoints V2 : 91 -> 194 (eager Celery + explicit propagate both pay re-merge cost) product_tag_remove -> 100 endpoints V2 : 53 -> 56 product_tag_add -> 100 locations V3 : 316 -> 320 product_tag_remove -> 100 locations V3 : 266 -> 270 The reimport and product-toggle increases are eliminated in Stages 3+4+5 (drop the duplicate `inherited_tags` TagField and the re-merge step becomes free because `tags` is the single source of truth).
`tag_inheritance.flush_for_product` is called unconditionally from the importer's `process_scan` (and reimporter equivalent). When tag inheritance is disabled (neither the system-wide flag nor the per-product flag is set) the previous implementation still walked every child queryset to compute the (empty) diff, adding ~9 queries per scan. Tests in `unittests/test_importers_performance.py` pin importer query counts in scenarios where inheritance is off. The Stage 2 commit's flush call shifted those baselines up by 9 across 10 test cases. Add an early-exit so the importer perf tests stay green and no behavior change ships under the inheritance-off configuration.
`_location_target_names(location)` issued one
`Product.objects.filter(...).distinct()` per location plus N
`product.tags.all()` per related product, producing an N+1 across the
location queryset on every product tag toggle.
Replace the per-location callable with a precomputed
{location_id: set[tag_name]} map built in 3 bulk queries: the two
LocationProductReference / LocationFindingReference paths union together
into {location_id: {product_id}}, then a single Product_tags
through-table read fans out to {product_id: {tag_name}}.
product_tag_add (100 locations, V3): 320 -> 123 queries
product_tag_remove (100 locations, V3): 270 -> 73 queries
ZAP scan import (V3): 984 -> 947
ZAP scan reimport, no change (V3): 140 -> 103
… JSONField Phase B Stage 3 of the tag inheritance redesign. Replaces the duplicate `inherited_tags` TagField on Engagement / Test / Finding / Endpoint / Location with a single `_inherited_tag_names` JSONField (Postgres `jsonb`). Drops 5 auto-generated through tables, 5 Tagulous tag tables, and 8 pghistory proxy/event models tied to the through tables. The new column carries an inverted GIN index so "find children that inherited tag X" remains efficient. Code changes ------------ - Models: 5x TagField -> JSONField. Migration 0265 copies existing M2M data into the JSON column, drops the field (Django removes the through table), drops the orphan pghistory and Tagulous models, and adds the GIN indexes. - `_manage_inherited_tags` / `propagate_inheritance` rewritten to read and write the JSON column. The `tags` M2M is only mutated when its contents actually need to change. - `_sync_inheritance_for_qs` reads `_inherited_tag_names` directly from each child instead of joining the through table; performs a bulk UPDATE per (target name set) group instead of per (model, tag) pair. Bulk re-merge dedup avoids double-writing the same (child, name) pair. - `LocationManager._bulk_inherit_tags` reads JSON column rather than the inherited_tags through table. - `tag_inheritance.flush_for_product` short-circuits when tag inheritance is disabled so importers don't pay for it on every scan. Plumbing -------- - `Meta.exclude` lists in forms / filters / api serializers updated from `inherited_tags` -> `_inherited_tag_names`. - `DojoFilter.filter_for_field` skips JSONField auto-generation for `_inherited_tag_names` (django-filter raises on JSONField). - Auditlog `tag_through_models` and `backfill.py` drop the inherited_tags proxies; pghistory now captures changes via the parent table's event. - `dojo_testdata.json` fixture renames `"inherited_tags": []` -> `"_inherited_tag_names": []`. - Tests using `.inherited_tags.all()` switched to read `_inherited_tag_names`. Pinned perf-test impact (this branch vs Stage 2) ----------------------------------------------- ZAP scan import V2 : 1006 -> 700 (~30% drop) ZAP scan import V3 : 984 -> 698 (~29% drop) ZAP reimport V2 (no change) : 82 -> 78 (~5% drop) ZAP reimport V3 (no change) : 140 -> 136 (~3% drop) Product tag add -> 100 findings V2/V3 : 94 -> 58 (~38% drop) Product tag remove -> 100 findings V2/V3 : 56 -> 38 (~32% drop) Product tag add -> 100 endpoints V2 : 194 -> 58 (~70% drop) Product tag remove -> 100 endpoints V2 : 56 -> 38 (~32% drop) Product tag add -> 100 locations V3 : 320 -> 272 (~15% drop) Product tag remove -> 100 locations V3 : 270 -> 246 (~9% drop) Create 1 finding under inheritance : 64 -> 42 (~34% drop) Create 100 findings under inheritance : 4024 -> 2814 (~30% drop) Finding add user tag (sticky) : 17 -> 16 Finding remove inherited tag (sticky) : 44 -> 24 (~45% drop) `test_importers_performance.py` baselines are unaffected because the flush_for_product call short-circuits when inheritance is disabled (the fixture used by those tests has it off). Migration notes --------------- - Destructive: drops 5 inherited_tags through tables and the 5 Tagulous tag-name tables that backed them. Forward-only. - Postgres only. Uses native `jsonb` + GIN index on the column. - Must be applied with a fresh DB or `--rebuilddb` in test environments; `--keepdb` will not pick up the schema change.
CI failure on rest-framework jobs: FieldDoesNotExist on FakeTaggedModel when loading dojo_testdata_locations.json. The JSON-column rename in 0265 dropped the M2M field, but three fixtures still referenced the old name. Sweep fixtures alongside the model migration so test setUpClass loads cleanly.
…e Stage 3 pins After Stage 2's location precompute landed under Stage 3, the `locations` list passed by `propagate_tags_on_product_sync` reached `_sync_inheritance_for_qs` which tried to call `.only(...)` on it. Accept either a list or a queryset. Recalibrate V3 pins for the Stage 2 + Stage 3 compound effect: - product_tag_add (100 locations): 123 -> 75 - product_tag_remove (100 locations): 73 -> 49 - ZAP scan import: 698 -> 661 - ZAP scan reimport, no change: 136 -> 99
61e9d61 to
f57e775
Compare
Maffooch
approved these changes
May 7, 2026
Member
Author
|
We're gonna hold off on this one for now. The schema change is not strictly needed right now, performance is good after stage 2. We'll reevaluate the whole use of django-tagoulus later this year. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Important
Depends on:
All must merge first; this branch is stacked on top.
Destructive migration. Drops 5 inherited_tags through tables and 5 Tagulous tag-name tables. Postgres only.
Summary
Replaces the duplicate
inherited_tagsTagField on Engagement / Test / Finding / Endpoint / Location with a single_inherited_tag_namesJSONField. The new column carries the same information (which entries intagscame from inheritance) in 1 row column instead of an M2M through table.Drops:
inherited_tagsTagFielddojo_<model>_inherited_tagsthrough tablesTagulous_<Model>_inherited_tags)Adds:
_inherited_tag_namesJSONField (Postgresjsonb)Code changes
_manage_inherited_tags/propagate_inheritancerewritten to read/write the JSON column. ThetagsM2M is only mutated when its contents actually change._sync_inheritance_for_qsreads_inherited_tag_namesdirectly per child instead of joining the through table; performs a bulk UPDATE per (target name set) group instead of per (model, tag) pair. Bulk re-merge dedup avoids double-writing the same (child, name) pair.LocationManager._bulk_inherit_tagsreads JSON column rather than the inherited_tags through table.Plumbing
Meta.excludelists renamedinherited_tags→_inherited_tag_names.DojoFilter.filter_for_fieldskips JSONField auto-generation for_inherited_tag_names(django-filter raises on JSONField otherwise).tag_through_modelsandbackfill.pydrop the inherited_tags proxy/event entries.dojo_testdata.jsonrenames"inherited_tags": []→"_inherited_tag_names": []..inherited_tags.all()switched to read_inherited_tag_names.Query-count impact
test_importers_performance.pybaselines unchanged becauseflush_for_productshort-circuits when inheritance is disabled (added in Stage 2).Verification
unittests.test_tag_inheritance— 36 tests pass (5 skipped for v3 feature flag).unittests.test_tag_utils_bulk— 29 tests pass.unittests.test_tag_inheritance_perf— 20 tests pass with the lowered pins.unittests.test_importers_performance— 10 tests pass.Migration notes
__containsSQL is not yet used; current code reads the JSON column per row and diffs in Python. A GIN index can be added in a follow-up if production query patterns shift toward SQL-side containment lookups.--rebuilddbin test environments;--keepdbwill not pick up the schema change.