Fix ZipArchive Update removing data descriptors#126447
Fix ZipArchive Update removing data descriptors#126447bwinsley wants to merge 14 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @karelz, @dotnet/area-system-io-compression |
There was a problem hiding this comment.
Pull request overview
Adds a regression test in System.IO.Compression to validate ZIP archives created in “streaming” mode (non-seekable stream, data-descriptor bit set) remain structurally valid after reopening in ZipArchiveMode.Update and adding an entry.
Changes:
- Added
System.Buffers.Binaryusage to inspect ZIP bytes. - Added a new
[Theory]test that creates a streaming ZIP, validates data-descriptor markers, updates the archive by adding an entry, then validates readability/structure again.
src/libraries/System.IO.Compression/tests/ZipArchive/zip_UpdateTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/ZipArchive/zip_UpdateTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/ZipArchive/zip_UpdateTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.Async.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchive.Async.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Show resolved
Hide resolved
… in compressed data
…ditions into functions
|
@dotnet-policy-service agree company="Displayr" |
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.Async.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/src/System/IO/Compression/ZipArchiveEntry.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.Compression/tests/ZipArchive/zip_UpdateTests.cs
Outdated
Show resolved
Hide resolved
| if (hadDataDescriptor) | ||
| { | ||
| _generalPurposeBitFlag |= BitFlagValues.DataDescriptor; | ||
|
|
||
| bool headerWritten = !(_originallyInArchive && Changes == ZipArchive.ChangeState.Unchanged && !forceWrite); | ||
| if (headerWritten) | ||
| { | ||
| PatchLocalFileHeaderBitFlags(); | ||
| } | ||
| } |
There was a problem hiding this comment.
WriteLocalFileHeaderAndDataIfNeededAsync calls PatchLocalFileHeaderBitFlags(), which performs a synchronous ArchiveStream.Write(...). Since this code path is in the async write pipeline, consider adding an async variant (e.g., PatchLocalFileHeaderBitFlagsAsync using WriteAsync) and awaiting it here to avoid blocking on synchronous I/O.
#126344
Fix ZipArchive Update mode corruption with data descriptor entries
Problem
PR #102704 introduced a selective-rewrite optimization for
ZipArchiveUpdate mode that avoids rewriting unchanged entries. However, it did not account for data descriptors — the optional trailing structures (16 or 24 bytes) that follow compressed data when bit 3 of the general purpose bit flag is set.Archives with data descriptors are produced by non-seekable streams (e.g., network streams,
DeflateStreamwrappers) and many external tools (Java'sZipOutputStream, Azure SDKs, etc.). When such an archive is opened in Update mode and new entries are added, the archive is silently corrupted:WriteFileCalculateOffsetsmiscalculatesnextFileOffset— it computes the end of an entry asGetOffsetOfCompressedData() + CompressedLength, which excludes data descriptor bytes. New entries are appended on top of the last entry's data descriptor.Metadata-only rewrite path skips too few bytes — when only metadata (e.g.,
LastWriteTime) is changed on an entry without loading its data, the seek past the entry advances by_compressedSizebut does not skip the data descriptor, misaligning all subsequent entries.Metadata-only rewrite clears bit 3 but leaves data descriptor in place —
WriteLocalFileHeaderInitializeclears bit 3 for seekable streams, but in the metadata-only path the data descriptor bytes remain physically in the file. A sequential ZIP reader would not expect a data descriptor and would misinterpret those bytes as the next local file header.isEmptyFilebranch doesn't clear bit 3 — if an existing entry with bit 3 is rewritten as empty, the flag is preserved but no data descriptor is written.Fix
nextFileOffsetcomputation (ZipArchive.cs, ZipArchive.Async.cs): Replace the per-entryGetOffsetOfCompressedData() + CompressedLengthcalculation with the offset of the next original entry's local file header (or_centralDirectoryStartfor the last entry). This naturally includes any trailing data descriptor bytes. Entry end offsets are precomputed in a single O(n) reverse pass viaComputeEntryEndOffsets(), shared by both sync and async paths.Metadata-only seek (ZipArchiveEntry.cs, ZipArchiveEntry.Async.cs): After seeking past
_compressedSize, detect and skip the data descriptor viaSkipDataDescriptor()/SkipDataDescriptorAsync(). These helpers read the first 4 bytes to detect the optional signature (0x08074B50) and seek past the remaining fields, usingAreSizesTooLargeto determine 32-bit (16 bytes total) vs 64-bit (24 bytes total) layout. The skip-bytes logic is shared viaGetDataDescriptorSkipBytes().Preserve bit 3 in metadata-only path (ZipArchiveEntry.cs, ZipArchiveEntry.Async.cs): Capture whether bit 3 was set before
WriteLocalFileHeaderclears it. After writing the header, restore bit 3 and patch the already-written local file header viaPatchLocalFileHeaderBitFlags(), keeping the on-disk header consistent with the physical data descriptor that remains in the file.Clear bit 3 in
isEmptyFilebranch (ZipArchiveEntry.cs): Add_generalPurposeBitFlag &= ~BitFlagValues.DataDescriptorso empty entries don't advertise a data descriptor that will never be written.Tests
Two regression tests added in
zip_UpdateTests.cs:Update_DataDescriptorSignature_IsCorrectlyWrittenAndPreserved— Creates a zip via non-seekable stream (forces bit 3 + data descriptors), reopens in Update mode, adds a new entry, validates the archive structure (bit flags, data descriptor signatures, entry count) and all entry data.Update_DataDescriptorWithMetadataOnlyChange_PreservesArchive— Same setup, but also modifiesLastWriteTimeon a middle entry without reading its data (exercises the metadata-only rewrite path). Verifies all entries remain readable with correct data and the metadata change is preserved.Both tests use
CompressionLevel.NoCompressionto ensure stored bytes are deterministic and signature-scanning validation is not susceptible to false positives from compressed data.Changed files
ZipArchive.csComputeEntryEndOffsets()helper;WriteFileuses precomputed end offsetsZipArchive.Async.csWriteFileAsyncusesComputeEntryEndOffsets()ZipArchiveEntry.csPatchLocalFileHeaderBitFlags(),SkipDataDescriptor(),GetDataDescriptorSkipBytes()helpers;isEmptyFileclears bit 3ZipArchiveEntry.Async.csSkipDataDescriptorAsync()zip_UpdateTests.cs