Skip to content

Fix ZipArchive Update removing data descriptors#126447

Draft
bwinsley wants to merge 14 commits intodotnet:mainfrom
bwinsley:126344
Draft

Fix ZipArchive Update removing data descriptors#126447
bwinsley wants to merge 14 commits intodotnet:mainfrom
bwinsley:126344

Conversation

@bwinsley
Copy link
Copy Markdown

@bwinsley bwinsley commented Apr 2, 2026

#126344

Fix ZipArchive Update mode corruption with data descriptor entries

Problem

PR #102704 introduced a selective-rewrite optimization for ZipArchive Update mode that avoids rewriting unchanged entries. However, it did not account for data descriptors — the optional trailing structures (16 or 24 bytes) that follow compressed data when bit 3 of the general purpose bit flag is set.

Archives with data descriptors are produced by non-seekable streams (e.g., network streams, DeflateStream wrappers) and many external tools (Java's ZipOutputStream, Azure SDKs, etc.). When such an archive is opened in Update mode and new entries are added, the archive is silently corrupted:

  1. WriteFileCalculateOffsets miscalculates nextFileOffset — it computes the end of an entry as GetOffsetOfCompressedData() + CompressedLength, which excludes data descriptor bytes. New entries are appended on top of the last entry's data descriptor.

  2. Metadata-only rewrite path skips too few bytes — when only metadata (e.g., LastWriteTime) is changed on an entry without loading its data, the seek past the entry advances by _compressedSize but does not skip the data descriptor, misaligning all subsequent entries.

  3. Metadata-only rewrite clears bit 3 but leaves data descriptor in placeWriteLocalFileHeaderInitialize clears bit 3 for seekable streams, but in the metadata-only path the data descriptor bytes remain physically in the file. A sequential ZIP reader would not expect a data descriptor and would misinterpret those bytes as the next local file header.

  4. isEmptyFile branch doesn't clear bit 3 — if an existing entry with bit 3 is rewritten as empty, the flag is preserved but no data descriptor is written.

Fix

nextFileOffset computation (ZipArchive.cs, ZipArchive.Async.cs): Replace the per-entry GetOffsetOfCompressedData() + CompressedLength calculation with the offset of the next original entry's local file header (or _centralDirectoryStart for the last entry). This naturally includes any trailing data descriptor bytes. Entry end offsets are precomputed in a single O(n) reverse pass via ComputeEntryEndOffsets(), shared by both sync and async paths.

Metadata-only seek (ZipArchiveEntry.cs, ZipArchiveEntry.Async.cs): After seeking past _compressedSize, detect and skip the data descriptor via SkipDataDescriptor() / SkipDataDescriptorAsync(). These helpers read the first 4 bytes to detect the optional signature (0x08074B50) and seek past the remaining fields, using AreSizesTooLarge to determine 32-bit (16 bytes total) vs 64-bit (24 bytes total) layout. The skip-bytes logic is shared via GetDataDescriptorSkipBytes().

Preserve bit 3 in metadata-only path (ZipArchiveEntry.cs, ZipArchiveEntry.Async.cs): Capture whether bit 3 was set before WriteLocalFileHeader clears it. After writing the header, restore bit 3 and patch the already-written local file header via PatchLocalFileHeaderBitFlags(), keeping the on-disk header consistent with the physical data descriptor that remains in the file.

Clear bit 3 in isEmptyFile branch (ZipArchiveEntry.cs): Add _generalPurposeBitFlag &= ~BitFlagValues.DataDescriptor so empty entries don't advertise a data descriptor that will never be written.

Tests

Two regression tests added in zip_UpdateTests.cs:

  • Update_DataDescriptorSignature_IsCorrectlyWrittenAndPreserved — Creates a zip via non-seekable stream (forces bit 3 + data descriptors), reopens in Update mode, adds a new entry, validates the archive structure (bit flags, data descriptor signatures, entry count) and all entry data.

  • Update_DataDescriptorWithMetadataOnlyChange_PreservesArchive — Same setup, but also modifies LastWriteTime on a middle entry without reading its data (exercises the metadata-only rewrite path). Verifies all entries remain readable with correct data and the metadata change is preserved.

Both tests use CompressionLevel.NoCompression to ensure stored bytes are deterministic and signature-scanning validation is not susceptible to false positives from compressed data.

Changed files

File Changes
ZipArchive.cs ComputeEntryEndOffsets() helper; WriteFile uses precomputed end offsets
ZipArchive.Async.cs WriteFileAsync uses ComputeEntryEndOffsets()
ZipArchiveEntry.cs Metadata-only path: save/restore bit 3, skip data descriptor; PatchLocalFileHeaderBitFlags(), SkipDataDescriptor(), GetDataDescriptorSkipBytes() helpers; isEmptyFile clears bit 3
ZipArchiveEntry.Async.cs Async mirrors: save/restore bit 3, SkipDataDescriptorAsync()
zip_UpdateTests.cs Two new regression tests

Copilot AI review requested due to automatic review settings April 2, 2026 02:49
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 2, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @karelz, @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a regression test in System.IO.Compression to validate ZIP archives created in “streaming” mode (non-seekable stream, data-descriptor bit set) remain structurally valid after reopening in ZipArchiveMode.Update and adding an entry.

Changes:

  • Added System.Buffers.Binary usage to inspect ZIP bytes.
  • Added a new [Theory] test that creates a streaming ZIP, validates data-descriptor markers, updates the archive by adding an entry, then validates readability/structure again.

Copilot AI review requested due to automatic review settings April 2, 2026 05:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

Copilot AI review requested due to automatic review settings April 2, 2026 06:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

@bwinsley
Copy link
Copy Markdown
Author

bwinsley commented Apr 2, 2026

@dotnet-policy-service agree company="Displayr"

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment on lines +411 to +420
if (hadDataDescriptor)
{
_generalPurposeBitFlag |= BitFlagValues.DataDescriptor;

bool headerWritten = !(_originallyInArchive && Changes == ZipArchive.ChangeState.Unchanged && !forceWrite);
if (headerWritten)
{
PatchLocalFileHeaderBitFlags();
}
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteLocalFileHeaderAndDataIfNeededAsync calls PatchLocalFileHeaderBitFlags(), which performs a synchronous ArchiveStream.Write(...). Since this code path is in the async write pipeline, consider adding an async variant (e.g., PatchLocalFileHeaderBitFlagsAsync using WriteAsync) and awaiting it here to avoid blocking on synchronous I/O.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.IO.Compression community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants