Skip to content

ENG-3517: Upload and promotion for user uploaded files - schema variant, storage util, hooks, config#8113

Merged
mikeGarifullin merged 16 commits into
mainfrom
ENG-3517-3
May 14, 2026
Merged

ENG-3517: Upload and promotion for user uploaded files - schema variant, storage util, hooks, config#8113
mikeGarifullin merged 16 commits into
mainfrom
ENG-3517-3

Conversation

@mikeGarifullin
Copy link
Copy Markdown
Contributor

@mikeGarifullin mikeGarifullin commented May 5, 2026

Ticket ENG-3517

Stack: depends on #8110 (data-layer foundation). Merge that first.

List of related changes

# Ticket Repo PR Description Approved
1 ENG-3517 fides #8110 Foundation — attachment_user_provided model + repository (lifecycle, locking, orphan-sweep query) [merged]
this PR ENG-3517 fides #8113 FileUploadCustomPrivacyRequestField schema variant, storage util (magic bytes, allowed types), promotion hook, config flags []
3 ENG-3517 fidesplus #3524 Attachment service + PrivacyRequestService promotion hook implementation []
4 ENG-3517 fidesplus #3534 POST /privacy-request/attachment HTTP endpoint (multipart upload, per-field constraints, rate limit) [x]
5 ENG-3518 fidesplus #3526 Hourly Celery orphan-sweep: deletes temp storage objects + uploaded rows older than cutoff [x]
6 ENG-3519 fidesplus #3543 Local-dev clamav-icap container (Ubuntu + c-icap + libclamav, profile-gated in compose) [x]
7 ENG-3520 fidesplus #3544 Wire ICAP scanner into upload path (RESPMOD call, X-Infection-Found handling) [x]

Description Of Changes

Schema + storage + service-extension half of the upload foundation. Pairs with the data-layer PR (#8110); the upload service and endpoint live in fidesplus and consume both.

Adds the Privacy Center config variant for file fields, a magic-byte sniff catalog so the upload route can verify a payload's claimed type at byte-zero (client-supplied Content-Type is not trusted), two config knobs for the global upload ceilings, and two PrivacyRequestService hook points so fidesplus can plug attachment-resolve and attachment-promote into the existing submission flow without overriding create_privacy_request wholesale.

Code Changes

  • schemas/privacy_center_config.py: FileUploadCustomPrivacyRequestField discriminated-union variant (field_type="file", max_size_bytes > 0, allowed_file_types non-empty + must be a subset of AllowedFileType).
  • schemas/attachment.py: AttachmentUploadResponse (the {id: "att_..."} payload returned by the fidesplus upload route).
  • service/storage/util.py:
    • FilesMagicBytes — known signatures (pdf, jpg/jpeg, png, gif, ...) and from_bytes(...) for byte-zero detection.
    • AllowedFileType.mime_types(), MIME_TO_EXTENSION, extension_for_mime(...) helpers.
    • FileUploadConstraints — runtime-validated config bag (per-field cap + allowed types) reused by the upload route and the schema validator.
  • service/privacy_request/privacy_request_service.py:
    • _resolve_attachment_state(submitted_data, *, db) and _promote_attachment_state(privacy_request, state) — overridable no-op hooks on the OSS service. Default impls do nothing so OSS behavior is unchanged.
    • create_privacy_request calls _resolve_attachment_state before persisting and _promote_attachment_state after caching/masking-secrets. On promotion failure the request row is deleted (clear_cached_values() runs as part of delete()) and the original error is rewrapped as PrivacyRequestError.
  • common/urn_registry.py: route constant for the future fidesplus upload endpoint.
  • config/security_settings.py: request_attachment_max_bytes global ceiling (caps the per-field max_size_bytes).
  • config/execution_settings.py: attachment_orphan_ttl_seconds for the orphan-cleanup sweep.
  • Tests for the schema variant, magic-byte catalog, and FileUploadConstraints.

Steps to Confirm

  1. Stack rebased onto ENG-3517: Foundation - attachment_user_provided model + repository #8110. From this branch run nox -s "pytest(ops-unit-non-api)" -- tests/ops/schemas/test_attachment.py tests/ops/schemas/test_privacy_center_config.py tests/ops/service/storage/test_util.py — all green.
  2. Construct a FileUploadCustomPrivacyRequestField with max_size_bytes=0 or allowed_file_types=[] — pydantic should reject. Construct one with an extension outside AllowedFileType — pydantic should reject.
  3. FilesMagicBytes.from_bytes(open("sample.pdf","rb").read(8)) returns the pdf member. The same call against random bytes returns None.
  4. From a Python shell: instantiate PrivacyRequestService and call create_privacy_request(...) with a non-file payload — flow runs unchanged (hooks are no-ops).
  5. (Cross-PR) The fidesplus upload route consumes FileUploadConstraints, FilesMagicBytes, and the new hook points — verify in the companion fidesplus PR.

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Add a db-migration This indicates that a change includes a database migration label to the entry if your change includes a DB migration
    • Add a high-risk This issue suggests changes that have a high-probability of breaking existing code label to the entry if your change includes a high-risk change (i.e. potential for performance impact or unexpected regression) that should be flagged
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • All UX related changes have been reviewed by a designer
    • No UX review needed
  • Followup issues:
    • Followup issues created
    • No followup issues
  • Database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!
    • No migrations
  • Documentation:
    • Documentation complete, PR opened in fidesdocs
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
    • No documentation updates required

@vercel
Copy link
Copy Markdown
Contributor

vercel Bot commented May 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
fides-plus-nightly Ready Ready Preview, Comment May 14, 2026 3:33pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
fides-privacy-center Ignored Ignored May 14, 2026 3:33pm

Request Review

@codecov
Copy link
Copy Markdown

codecov Bot commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.59%. Comparing base (78fd8e0) to head (88538b2).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8113      +/-   ##
==========================================
+ Coverage   85.57%   85.59%   +0.01%     
==========================================
  Files         657      658       +1     
  Lines       42727    42847     +120     
  Branches     5012     5018       +6     
==========================================
+ Hits        36565    36675     +110     
- Misses       5055     5064       +9     
- Partials     1107     1108       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mikeGarifullin mikeGarifullin force-pushed the ENG-3517-3 branch 2 times, most recently from 9dd46c8 to 3294cad Compare May 6, 2026 09:15
@mikeGarifullin mikeGarifullin changed the title ENG-3517: Foundation - schema variant, storage util, hooks, config ENG-3517: Upload and promotion for user uploaded files - schema variant, storage util, hooks, config May 6, 2026
@mikeGarifullin mikeGarifullin force-pushed the ENG-3517-3 branch 3 times, most recently from 9d8d72e to 96eecd0 Compare May 6, 2026 11:15
@mikeGarifullin mikeGarifullin marked this pull request as ready for review May 6, 2026 14:42
@mikeGarifullin mikeGarifullin requested a review from a team as a code owner May 6, 2026 14:42
@mikeGarifullin mikeGarifullin requested review from galvana and removed request for a team May 6, 2026 14:42
@mikeGarifullin mikeGarifullin requested a review from a team as a code owner May 6, 2026 15:34
@mikeGarifullin mikeGarifullin requested review from speaker-ender and removed request for a team May 6, 2026 15:34
@mikeGarifullin mikeGarifullin force-pushed the ENG-3517-3 branch 3 times, most recently from e6f0a43 to f28be72 Compare May 7, 2026 10:56
@mikeGarifullin mikeGarifullin force-pushed the ENG-3517-3 branch 2 times, most recently from f321175 to 41ef133 Compare May 7, 2026 11:58
Comment thread src/fides/api/schemas/privacy_center_config.py Outdated
Comment thread tests/unit/test_create_privacy_request_attachment_path.py Outdated
Copy link
Copy Markdown
Contributor

@JadeCara JadeCara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments - thinking the bool change is the most important to verify.

Data layer for the data-subject file-upload feature: `attachment_user_provided`
table + repository primitives. Upload service + endpoint live in fidesplus
and consume this PR.

Lifecycle: uploaded → promoted | deleted (deleted rows kept for audit).
The (field_name, property_id, policy_key) triple binds an upload to the
privacy-center config it was issued under so re-validation at submission
can refuse mismatches. storage_key is intentionally non-FK so audit rows
survive storage-config churn.

* Model: AttachmentUserProvidedStatus enum + AttachmentUserProvided model
  with composite (status, created_at) index for the orphan-sweep predicate.
  field_name/property_id/policy_key are intentionally non-FK so audit rows
  survive policy/property/field rename or delete between upload and
  submission.
* Migration 9b449105864d, reversible.
* Repository:
  - create_uploaded: insert + flush + refresh, returns AttachmentUserProvidedRecord
  - lock_by_ids: SELECT … FOR UPDATE ordered by id (deadlock-safe under
    overlapping batches), returns dict[str, row] keyed by id; missing ids
    absent. session is required (no @with_optional_sync_session) since the
    lock is only meaningful inside a caller-owned transaction.
  - assert_all_uploaded: lifecycle precondition check
  - mark_promoted: pure mutation, caller owns the session boundary
  - mark_deleted: in-session only, no commit
  - list_uploaded_older_than: orphan-sweep query, exclusive cutoff
* Domain entities (AttachmentUserProvidedRecord) + exceptions
  (AttachmentsServiceError + InvalidAttachmentStateError).
* Test suite covers every public method incl. lifecycle invariants,
  empty-list short-circuit, missing-id detection, cutoff exclusivity.
* .fides/db_dataset.yml: dataset entries for the new table.
…fig, and tests

- Add domain exceptions for the attachments service
- Add schema variant (attachment.py), storage util, extension hooks in privacy_request_service
- Add config toggles (allow_custom_privacy_request_file_upload, security settings)
- Cover new code with unit tests (exceptions, security/execution settings, storage util)
- AttachmentNotFoundError takes an optional reason tag so callers can
  surface diagnostic detail (e.g. row status) without expanding the
  exception surface.
- FilesMagicBytes.max_prefix_length co-locates the header-peek length
  with the signatures it derives from; callers no longer recompute it
  across the repo boundary.
Cover the two diff lines codecov/patch flagged at 100% target:
- AttachmentNotFoundError(reason=...) appends the tag to the message.
- FilesMagicBytes.max_prefix_length tracks the longest signature.
Base automatically changed from ENG-3517-2 to main May 14, 2026 09:09
@mikeGarifullin mikeGarifullin requested a review from JadeCara May 14, 2026 14:34
Copy link
Copy Markdown
Contributor

@JadeCara JadeCara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the updates! I ran a quick security review on it too and it all came back clean!

@mikeGarifullin mikeGarifullin added this pull request to the merge queue May 14, 2026
Merged via the queue into main with commit b3374c0 May 14, 2026
68 of 69 checks passed
@mikeGarifullin mikeGarifullin deleted the ENG-3517-3 branch May 14, 2026 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants