-
Notifications
You must be signed in to change notification settings - Fork 35
Detect and extract non-indexed ZIP entries hidden outside the central directory #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: gfs <[email protected]>
… directory Add opt-in ExtractNonIndexedZipEntries option that uses a forward-only SharpCompress reader to walk local file headers and discover entries absent from the central directory. Such entries are flagged with the new NonIndexedEntry status on FileEntryStatus. Co-authored-by: gfs <[email protected]>
…onsistent with async Co-authored-by: gfs <[email protected]>
gfs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot please address this review feedback
| /// marked with <see cref="FileEntryStatus.NonIndexedEntry"/> and may indicate hidden or | ||
| /// steganographic content. Disabled by default for performance. | ||
| /// </summary> | ||
| public bool ExtractNonIndexedZipEntries { get; set; } = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to other comment, lets leave the specific archive implementation details out of the variable names - this should be ExtractNonIndexedEntries to support that conceptually this may expand to other archive types where similar techniques could make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to ExtractNonIndexedEntries and updated the doc comment to be generic in 297177e.
| /// <summary> | ||
| /// Tests that the ZIP extractor can discover and extract entries whose local file headers | ||
| /// exist in the stream but are absent from the central directory ("non-indexed" entries). | ||
| /// See: https://github.com/microsoft/RecursiveExtractor/issues/new (steganography in ZIP). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the correct issue number reference. this is the page to f ile a new issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the incorrect URL in 297177e.
| } | ||
|
|
||
| // Re-iterate through entries with the correct archive instance | ||
| var catalogedKeysSync = new HashSet<string>(StringComparer.Ordinal); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for this to be named sync its in a different context than the async version so there's no veriable name collision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to catalogedKeys in 297177e.
| { | ||
| bool advanced; | ||
| try { advanced = forwardReader.MoveToNextEntry(); } | ||
| catch (Exception) { break; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth debug or trace logging these exceptions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added debug logging for the exception in 297177e.
|
|
||
| payload ??= new MemoryStream(); | ||
| var entryName = readerKey.Replace('/', Path.DirectorySeparatorChar); | ||
| var discovered = new FileEntry(entryName, payload, parentEntry, memoryStreamCutoff: options.MemoryStreamCutoff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use passthrough stream true argument to fileentry since we've already created the stream copy either on line 338 or as an empty memory stream on 345.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added passthroughStream: true in 297177e.
| { | ||
| bool advanced; | ||
| try { advanced = forwardReader.MoveToNextEntry(); } | ||
| catch (Exception) { break; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should maybe log these exceptions as debug or trace messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added debug logging for the exception in 297177e.
| payload ??= new MemoryStream(); | ||
|
|
||
| var entryName = readerKey.Replace('/', Path.DirectorySeparatorChar); | ||
| var discovered = new FileEntry(entryName, payload, parentEntry, memoryStreamCutoff: options.MemoryStreamCutoff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should use passthrough stream true because the stream copy has already been manually created on 417 or 424 (if the copy failed and we're just providing a placeholder empty stream).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added passthroughStream: true in 297177e.
…hrough streams - Rename ExtractNonIndexedZipEntries → ExtractNonIndexedEntries - Make NonIndexedEntry enum description generic (not ZIP-specific) - Make ExtractorOptions doc comment generic for future archive type support - Rename catalogedKeysSync → catalogedKeys (no collision) - Add debug logging for MoveToNextEntry exceptions in both async and sync - Use passthroughStream: true for non-indexed FileEntry construction - Remove incorrect issue URL from test doc comment Co-authored-by: gfs <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@copilot please run all the library tests and confirm any that have regressed? Am seeing a failure in the pr pipeline when executing the test suite |
Ran the full test suite (546 tests on net9.0 Linux). All 4 of my new The 68 failures are all pre-existing issues unrelated to this PR — none are regressions:
Since the CI pipeline runs on Windows ( |
|
@copilot System.Io.Compression doesn't have ZipArchive on NET 4.8 thus causing the errors when running the tests on windows - should we just exclude those tests unless we're runnig on a more recent framework version or is there a polyfill package we can include from nuget to resolve that? |
…MEWORK System.IO.Compression.ZipArchive is not reliably available on .NET Framework 4.8. Wrap the entire test class in #if !NETFRAMEWORK to skip these tests on net48 while keeping them active on net8.0, net9.0, and net10.0. Co-authored-by: gfs <[email protected]>
…st project Instead of excluding tests with #if !NETFRAMEWORK, add a conditional assembly reference for System.IO.Compression on net48 so ZipArchive is available and the tests compile on all target frameworks. Also removes accidentally committed out.xml. Co-authored-by: gfs <[email protected]>
|
Explicitly reference System.IO.Compression in the test project when targeting .NET Framework 4.8 to ensure compression functionality is available during test execution.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
NonIndexedEntryenum description generic (not ZIP-specific)ExtractNonIndexedZipEntries→ExtractNonIndexedEntries(option + all references)ExtractorOptionsdoc comment generic for future archive type supportcatalogedKeysSync→catalogedKeysMoveToNextEntryexceptions (async + sync)passthroughStream: truefor non-indexed entry FileEntry construction (async + sync)System.IO.Compressionframework reference for test projectOriginal prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.