⚡ speed up migrate_endpoints_to_locations (~14× fewer queries)#14841
Merged
Conversation
Reduces per-endpoint cost in the Endpoint→Location data migration from ~613 queries / 240 ms to ~44 queries / 17 ms on a 50-endpoint / 1,000-finding local benchmark — a 13.9× query reduction and a 14.1× wall-clock improvement that should bring a 16-hour prod run under one hour. As a side-effect, fixes the latent associate_with_product short-circuit bug where 'Mitigated' could stick on a LocationProductReference even after later Active findings came in for the same product. Changes (kept inside this management command — no edits to shared Location model methods): - select_related/prefetch_related on the main Endpoint queryset so the per-endpoint loop has no hidden joins through tags, endpoint_meta, status_endpoint, or finding→test→engagement→product/mitigated_by. - tags.add(*names) splat instead of N round-trips per tag. - DojoMeta.bulk_create(ignore_conflicts=True) per endpoint instead of get_or_create per row (DojoMeta.unique_together = (location, name) makes ignore_conflicts semantically equivalent here). - LocationFindingReference and LocationProductReference are bulk_created per endpoint instead of going through Location.associate_with_finding / associate_with_product. This bypasses BaseModel.save's full_clean() validate_unique queries AND the inherit_tags_on_linked_instance post_save signal (which fires all_related_products through the finding→test→engagement→product chain on every save). Product status is derived in-memory across all of an endpoint's finding statuses. - _suspend_auto_now_add wraps the LocationFindingReference bulk write so the explicit 'created' value (= source Endpoint_Status.date) is honored. Django's SQLInsertCompiler.pre_save_val calls Field.pre_save(add=True) even from bulk_create; auto_now_add would otherwise overwrite our value with now(). - New CLI flags for ops visibility on long runs: --batch-size, --progress-every, --benchmark, --query-count. Default progress line: 'Migrated X/Y (z%) — N ep/sec — ETA …'. Per-step measurements (50 ep / 1,000 findings, V3_FEATURE_LOCATIONS=True, local docker postgres): step wall queries/ep verifier baseline (instrumented) 12.00s 613 14 LPR-status warnings (pre-existing bug) + prefetch_related 10.63s 528 same + tags splat 10.08s 507 same + DojoMeta bulk_create 10.24s 498 same + bulk LFR/LPR + fix 0.85s 44 all strict checks pass Idempotent re-runs validated. Verifier checks counts (URLs, Locations, LFR, LPR, location-DojoMeta), per-row LFR fields (status, created, audit_time, auditor), endpoint→location tag subset, and DojoMeta (location, name) parity. Intentional behavioral diffs vs. the previous code: 1. LocationProductReference.status now reflects 'Active iff any finding for this (location, product) is Active' — fixes the associate_with_product first-write-wins bug. Previously order- dependent; ~28% of product refs were mis-statused on the seeded distribution. 2. Tag inheritance via the inherit_tags_on_linked_instance post_save signal does NOT fire (bulk_create skips signals). For deployments with enable_product_tag_inheritance=True on products (or the system setting on), inherited product tags will not be propagated onto migrated Locations during this command. The seed used in benchmarking does not exercise this path. If your environment uses product tag inheritance, follow up with a one-time Location.inherit_tags pass after this command — or call out and we can bake _bulk_inherit_tags into the migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
valentijnscholten
approved these changes
May 8, 2026
Member
valentijnscholten
left a comment
There was a problem hiding this comment.
Sounds good. But wouldn't it make sense to also add the step to inherit the tags as it's cheap / easy?
blakeaowens
approved these changes
May 8, 2026
dogboat
approved these changes
May 11, 2026
…r isolation `bulk_create` (introduced in the prior perf commit) skips the `inherit_tags_on_linked_instance` post_save signal, so deployments with `enable_product_tag_inheritance` enabled (per-product or system-wide) would not pick up inherited product tags on migrated Locations. Track (product, location) pairs during the main loop — covering both `endpoint.product` and `finding.test.engagement.product` — and run a post-pass that calls `LocationManager(product)._bulk_inherit_tags(locations)` once per contributing product. The helper rediscovers each location's full product set via LocationProductReference/LocationFindingReference and diff-checks before writing, so revisits of shared locations across product groups are idempotent. ~5 queries per product group vs ~3 per location for a per-location `inherit_tags()` loop. Also wrap the per-endpoint body in a `try`/`except Exception` so a single bad row doesn't abort a multi-hour migration. Failures get logged with full traceback and tracked in `self.failed_endpoints`; the final "Done." line reports `<successful>/<total>` and a yellow warning lists the first 10 failing IDs. `KeyboardInterrupt` / `SystemExit` are not swallowed. The post-pass uses the same pattern per product group. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a perf bugfix to an existing one-shot data migration; no new features or surface area, no new dependencies, single file changed. The migration is currently producing latently-incorrect data in addition to being unusably slow on real instances (16+ hours for ~650 endpoints / 22k findings — see Description), so I went ahead.
Description
dojo/management/commands/migrate_endpoints_to_locations.pyis a one-shot migration that maps eachEndpoint→URL/Location, eachEndpoint_Status→LocationFindingReference, and each (endpoint→product, finding→product) pair →LocationProductReference. On a real instance with 650 endpoints and 22k findings, it had produced 80 locations in 2 hours (~90 s/endpoint) and was projected at ~16 hours total. Per-endpoint queries on a 50-endpoint / 1,000-finding local benchmark came in at 613.This PR keeps changes inside the management command itself (no edits to shared
Locationmodel methods) and reduces per-endpoint queries to 44 and per-endpoint wall-clock to 17 ms on the same benchmark — a 13.9× query reduction / 14.1× wall-clock improvement. Idempotent re-runs validated; strict accuracy checks pass.The work was applied as five logically-distinct changes; each row of the table below was measured independently against a freshly-reset migration target so the gain attribution is honest:
select_related/prefetch_relatedtags,endpoint_meta,status_endpoint, andfinding → test → engagement → product/mitigated_bytags.add(*names)splattagsphase −61%DojoMeta.bulk_create(ignore_conflicts=True)metaphase −67% —DojoMeta.unique_together = (location, name)makes ignore_conflicts equivalent to the previousget_or_createbulk_createLFR/LPR + in-memory product statusBaseModel.save → full_clean()validate_unique queries AND theinherit_tags_on_linked_instancepost_save signal that firesall_related_products(a finding/test/engagement/product join) on every saveWhere the savings actually came from, surfaced by the
--query-countflag on a single endpoint with 20 statuses:validate_uniquequeries × 2 saves per status = ~120 redundant queries / endpoint just fromfull_cleanall_related_products+ 1system_settingsper save × ~40 saves per endpoint = ~80 redundant queries from the inherit-tags post_save signalfinding.test.engagement.product) × 20 statuses = 60 queries.exists() → .get() → update_or_createtriple inassociate_with_finding/associate_with_product— 2 redundant queries per statusbulk_create(ignore_conflicts=True)for the reference rows skips all of those at once.One Django gotcha worth flagging in review
LocationFindingReference.createdisauto_now_add=True. The original migration setcreatedto the sourceEndpoint_Status.datevia a post-save UPDATE, which works becauseauto_now_addonly fires on the initial INSERT (add=True). Withbulk_createwe don't get a post-save UPDATE — and Django'sSQLInsertCompiler.pre_save_valdoes callField.pre_save(add=True)even frombulk_create, soauto_now_addwould overwrite our explicit value withnow(). The PR uses a small_suspend_auto_now_addcontext manager that togglesauto_now_add=Falseon the field during the bulk write, then restores it. Single-process management command — no thread-safety concerns.Behavioral diffs you should know about
LocationProductReference.statusis now correct (semantic improvement). The previous code calledLocation.associate_with_product(...), which short-circuits if the (location, product) pair already exists — never updating the status when a later Active finding comes in for the same product. With the seed's 70/30 Active/non-Active distribution, ~28% of product refs landed onMitigatedwhen the logical truth wasActive. The new code derives product status in-memory across all of an endpoint's finding statuses (Activeiff any of them is Active, elseMitigated) andbulk_creates the row. The strict mode of the local accuracy verifier explicitly catches this — baseline fails strict, this PR passes strict.Inherited product tags via
post_savesignals do NOT fire.bulk_createskips signals, soinherit_tags_on_linked_instance(dojo/tags_signals.py:65-68) does not run during the migration. For deployments where any product hasenable_product_tag_inheritance=True(or the system-wide setting is on), inherited product tags will not be propagated onto migratedLocations by this command. Two reasonable mitigations, both deferred to a follow-up:inherit_tags()(cheap — one query per location, paid once instead of per-LFR)._bulk_inherit_tagshelper atdojo/importers/location_manager.py:486, which is designed for exactly this scenario.I'd be happy to land either as a follow-up PR if you confirm your prod has tag inheritance enabled. The benchmark seed does not exercise this path.
Things intentionally NOT changed
Location.associate_with_finding/associate_with_productare untouched. They still have the.exists() → .get() → update_or_createpattern and the first-write-wins behavior on product status. The migration just stops calling them on its hot path. Other callers (e.g., importers) are unaffected.--start-from <id>/--skip-validationflags. Considered during planning, dropped to keep blast radius small. Easy follow-ups if the prod run reveals a need.Migrated X/Y (z%) — N ep/sec — ETA …line is enough for ops;--benchmarkcovers diagnostic depth.Test results
Verified locally on the docker dev stack with
DD_V3_FEATURE_LOCATIONS=Trueagainst a representative seed (5 products × 10 endpoints × 20 findings, with the four non-Active status flag distributions exercised: 70% Active / 15% Mitigated / 5% RA / 5% FP / 5% OOS):status,created,audit_time,auditor), endpoint→location tag subset, and DojoMeta(location, name)parity all agree with source data.python manage.py checkclean.ruff checkclean.I have not exercised this on a 22k-finding instance directly. The 50/1,000 numbers above are linear-scaling from a per-endpoint perspective; if anyone has a prod-shape instance handy, validation there would be welcome.
Documentation
No user-facing docs to update — this is a one-shot internal data migration command. The new
--benchmark/--query-count/--batch-size/--progress-everyflags are documented viaargparsehelp=strings.