Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Feb 10, 2026

  • Explore codebase and understand review feedback
  • Make NonIndexedEntry enum description generic (not ZIP-specific)
  • Rename ExtractNonIndexedZipEntriesExtractNonIndexedEntries (option + all references)
  • Make ExtractorOptions doc comment generic for future archive type support
  • Fix incorrect issue URL in test doc comment
  • Rename catalogedKeysSynccatalogedKeys
  • Add debug logging for MoveToNextEntry exceptions (async + sync)
  • Use passthroughStream: true for non-indexed entry FileEntry construction (async + sync)
  • Verify all 12 ZIP tests pass
  • Fix net48 build: add System.IO.Compression framework reference for test project
Original prompt

This section details on the original issue you should resolve

<issue_title>Find Non-Indexed files in zips</issue_title>
<issue_description>It is possible to hide extra files in a zip file that are not referenced in the directory. These may be missed by standard extraction, but it should be possible to find them (and perhaps flag them as suspicious).

https://www.codeproject.com/Articles/13808/Steganography-16-Hiding-additional-files-in-a-ZIP</issue_description>

<agent_instructions>Please investigate how we could accomplish - or at least detect this kind of situation, ideally we'd want to avoid having to have a complete second read of the stream - for performance reasons - and we also want to continue using external libraries for zip parsing - but we do reference a number of dependencies between sharpcompress, the standard lib and discutils, so we can consider switching between libraries if needed.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits February 10, 2026 23:17
… directory

Add opt-in ExtractNonIndexedZipEntries option that uses a forward-only
SharpCompress reader to walk local file headers and discover entries absent
from the central directory. Such entries are flagged with the new
NonIndexedEntry status on FileEntryStatus.

Co-authored-by: gfs <[email protected]>
Copilot AI changed the title [WIP] Add detection for non-indexed files in zip archives Detect and extract non-indexed ZIP entries hidden outside the central directory Feb 10, 2026
Copilot AI requested a review from gfs February 10, 2026 23:32
Copy link
Contributor

@gfs gfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please address this review feedback

/// marked with <see cref="FileEntryStatus.NonIndexedEntry"/> and may indicate hidden or
/// steganographic content. Disabled by default for performance.
/// </summary>
public bool ExtractNonIndexedZipEntries { get; set; } = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to other comment, lets leave the specific archive implementation details out of the variable names - this should be ExtractNonIndexedEntries to support that conceptually this may expand to other archive types where similar techniques could make sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to ExtractNonIndexedEntries and updated the doc comment to be generic in 297177e.

/// <summary>
/// Tests that the ZIP extractor can discover and extract entries whose local file headers
/// exist in the stream but are absent from the central directory ("non-indexed" entries).
/// See: https://github.com/microsoft/RecursiveExtractor/issues/new (steganography in ZIP).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct issue number reference. this is the page to f ile a new issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the incorrect URL in 297177e.

}

// Re-iterate through entries with the correct archive instance
var catalogedKeysSync = new HashSet<string>(StringComparer.Ordinal);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for this to be named sync its in a different context than the async version so there's no veriable name collision.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to catalogedKeys in 297177e.

{
bool advanced;
try { advanced = forwardReader.MoveToNextEntry(); }
catch (Exception) { break; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth debug or trace logging these exceptions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added debug logging for the exception in 297177e.


payload ??= new MemoryStream();
var entryName = readerKey.Replace('/', Path.DirectorySeparatorChar);
var discovered = new FileEntry(entryName, payload, parentEntry, memoryStreamCutoff: options.MemoryStreamCutoff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use passthrough stream true argument to fileentry since we've already created the stream copy either on line 338 or as an empty memory stream on 345.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added passthroughStream: true in 297177e.

{
bool advanced;
try { advanced = forwardReader.MoveToNextEntry(); }
catch (Exception) { break; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should maybe log these exceptions as debug or trace messages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added debug logging for the exception in 297177e.

payload ??= new MemoryStream();

var entryName = readerKey.Replace('/', Path.DirectorySeparatorChar);
var discovered = new FileEntry(entryName, payload, parentEntry, memoryStreamCutoff: options.MemoryStreamCutoff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use passthrough stream true because the stream copy has already been manually created on 417 or 424 (if the copy failed and we're just providing a placeholder empty stream).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added passthroughStream: true in 297177e.

…hrough streams

- Rename ExtractNonIndexedZipEntries → ExtractNonIndexedEntries
- Make NonIndexedEntry enum description generic (not ZIP-specific)
- Make ExtractorOptions doc comment generic for future archive type support
- Rename catalogedKeysSync → catalogedKeys (no collision)
- Add debug logging for MoveToNextEntry exceptions in both async and sync
- Use passthroughStream: true for non-indexed FileEntry construction
- Remove incorrect issue URL from test doc comment

Co-authored-by: gfs <[email protected]>
Copy link
Contributor Author

Copilot AI commented Feb 11, 2026

@copilot please address this review feedback

All review feedback addressed in 297177e. All 12 ZIP tests pass.

Copilot AI requested a review from gfs February 11, 2026 18:58
@gfs
Copy link
Contributor

gfs commented Feb 11, 2026

/azp run

@gfs gfs marked this pull request as ready for review February 11, 2026 20:38
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@gfs
Copy link
Contributor

gfs commented Feb 12, 2026

@copilot please run all the library tests and confirm any that have regressed? Am seeing a failure in the pr pipeline when executing the test suite

Copy link
Contributor Author

Copilot AI commented Feb 12, 2026

@copilot please run all the library tests and confirm any that have regressed? Am seeing a failure in the pr pipeline when executing the test suite

Ran the full test suite (546 tests on net9.0 Linux). All 4 of my new NonIndexedZipEntryTests pass, and all 12 ZIP-related tests pass.

The 68 failures are all pre-existing issues unrelated to this PR — none are regressions:

  • 28 tests fail due to Linux case sensitivity: test references TestDataArchivesNested.Zip but file is TestDataArchivesNested.zip
  • 12 tests fail due to Linux case sensitivity: test references 100Trees.7z but file is 100trees.7z
  • 18 tests fail because WIM support is Windows-only (TestData.wim, TestDataCorruptWim.zip)
  • 4 tests fail on TestDataEncryptedAes.zip (pre-existing SharpCompress AES issue)
  • 2 tests fail on SanitizePathLinux colon handling
  • 1 test fails due to case sensitivity: TestData.Tar vs TestData.tar
  • 4 timeout tests fail due to case sensitivity on the nested zip file

Since the CI pipeline runs on Windows (MSSecurity-1ES-Windows-2022), the case sensitivity issues won't apply there. The WIM and AES failures are pre-existing on the main branch as well.

Copilot AI requested a review from gfs February 12, 2026 00:05
@gfs
Copy link
Contributor

gfs commented Feb 12, 2026

@copilot System.Io.Compression doesn't have ZipArchive on NET 4.8 thus causing the errors when running the tests on windows - should we just exclude those tests unless we're runnig on a more recent framework version or is there a polyfill package we can include from nuget to resolve that?

Copilot AI and others added 2 commits February 12, 2026 00:18
…MEWORK

System.IO.Compression.ZipArchive is not reliably available on .NET Framework 4.8.
Wrap the entire test class in #if !NETFRAMEWORK to skip these tests on net48
while keeping them active on net8.0, net9.0, and net10.0.

Co-authored-by: gfs <[email protected]>
…st project

Instead of excluding tests with #if !NETFRAMEWORK, add a conditional
assembly reference for System.IO.Compression on net48 so ZipArchive
is available and the tests compile on all target frameworks.

Also removes accidentally committed out.xml.

Co-authored-by: gfs <[email protected]>
Copy link
Contributor Author

Copilot AI commented Feb 12, 2026

@copilot System.Io.Compression doesn't have ZipArchive on NET 4.8 thus causing the errors when running the tests on windows - should we just exclude those tests unless we're runnig on ...

System.IO.Compression.ZipArchive is available on .NET Framework 4.8 — the assembly just needs an explicit reference. Added a conditional <Reference Include="System.IO.Compression" /> for net48 in the test project so the tests compile and run on all target frameworks without needing a polyfill or exclusion. Fixed in 0e792a1.

gfs added 2 commits February 11, 2026 16:24
Explicitly reference System.IO.Compression in the test project
when targeting .NET Framework 4.8 to ensure compression
functionality is available during test execution.
@gfs
Copy link
Contributor

gfs commented Feb 12, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Find Non-Indexed files in zips

2 participants