Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Feb 6, 2026

  • Update agent instructions to note NBGV deep clone requirement
  • Bump SharpCompress 0.40.0 → 0.44.5 (supports ARJ, ARC, ACE)
  • Add ARJ, ARC, ACE to ArchiveFileType enum with explicit numeric values
  • Add magic byte detection for ARJ, ARC, ACE in MiniMagic.cs
  • Create ArjExtractor.cs, ArcExtractor.cs, AceExtractor.cs
  • Register new extractors in SetDefaultExtractors()
  • Create test data files (TestData.arj, TestData.arc, TestData.ace)
  • Fix DetectFileType regression: gate each signature check on bytesRead
  • Fix encrypted 7z stream position regression from SharpCompress upgrade
  • Add FailedArchive status assertions to count-only test methods
  • Fix using statement syntax and add stream position reset in all extractors
  • Move INVALID enum next to UNKNOWN for readability (value=18 preserved)
  • Add nested formats test archive (ZIP→ACE→ARC→ARJ→TAR with 2 text files)
  • Add content validation tests (sync + async) verifying extracted file content
  • Add entries in MiniMagicTests and ExpectedNumFilesTests
  • Add *.arj, *.arc, *.ace to .gitattributes as binary

Fixes #180 #192.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Reimplement arj and arc changes with cleaner commit history Add ARJ, ARC, and ACE archive decompression support Feb 6, 2026
Copilot AI requested a review from gfs February 6, 2026 18:37
@gfs
Copy link
Contributor

gfs commented Feb 6, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds native extraction support for ARJ, ARC, and ACE archives (via SharpCompress) to RecursiveExtractor, including MiniMagic type detection and test coverage to validate detection + extraction output counts.

Changes:

  • Upgraded SharpCompress to enable ARJ/ARC/ACE reader support.
  • Added MiniMagic signatures and new ArchiveFileType enum values for ARJ/ARC/ACE, plus registered new extractors.
  • Added small fixture archives and expanded unit tests to cover detection and extracted-file counts.

Reviewed changes

Copilot reviewed 10 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
RecursiveExtractor/RecursiveExtractor.csproj Bumps SharpCompress dependency to a version that supports ARJ/ARC/ACE.
RecursiveExtractor/MiniMagic.cs Adds ARJ/ARC/ACE types and signature detection; increases header read size.
RecursiveExtractor/Extractors/ArjExtractor.cs New ARJ extractor implementation (sync + async).
RecursiveExtractor/Extractors/ArcExtractor.cs New ARC extractor implementation (sync + async), with post-decompression size check.
RecursiveExtractor/Extractors/AceExtractor.cs New ACE extractor implementation (sync + async).
RecursiveExtractor/Extractor.cs Registers ARJ/ARC/ACE extractors in SetDefaultExtractors().
RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.arj Adds ARJ test fixture archive.
RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.arc Adds ARC test fixture archive.
RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.ace Adds ACE test fixture archive.
RecursiveExtractor.Tests/RecursiveExtractor.Tests.csproj Ensures new fixture archives are copied to test output.
RecursiveExtractor.Tests/ExtractorTests/MiniMagicTests.cs Adds MiniMagic detection tests for ARJ/ARC/ACE fixtures.
RecursiveExtractor.Tests/ExtractorTests/ExpectedNumFilesTests.cs Adds extraction count expectations for ARJ/ARC/ACE fixtures.
.github/copilot-instructions.md Documents deep-clone requirement for Nerdbank.GitVersioning builds.

@gfs
Copy link
Contributor

gfs commented Feb 6, 2026

@copilot it doesn't intuitive eem like it should be related - but with this change i'm seeing the tests for TestDataEncrypted.7z fail with 0 results found on both the lib and cli tests. Is there perhaps an issue with the changes to minimagic that are causing an incorrect header detection on that file is one theory i have.

…e each signature check on bytes actually read

Co-authored-by: gfs <[email protected]>
SharpCompress 0.44.5 now throws CryptographicException at Open() time
for encrypted 7z archives, leaving the stream at a non-zero position.
Subsequent password retry calls to Open() then fail with
EndOfStreamException. Reset fileEntry.Content.Position = 0 before each
retry, matching the pattern already used by RarExtractor and ZipExtractor.

Co-authored-by: gfs <[email protected]>
Copy link
Contributor Author

Copilot AI commented Feb 6, 2026

@copilot it doesn't intuitive eem like it should be related - but with this change i'm seeing the tests for TestDataEncrypted.7z fail with 0 results found on both the lib and cli tes...

Good catch — it's not MiniMagic but a side effect of the SharpCompress 0.44.5 upgrade. In the old version, SevenZipArchive.Open() on encrypted archives succeeded silently and detection happened later via .IsEncrypted. In 0.44.5, Open() now throws CryptographicException immediately, which leaves the stream at a non-zero position (e.g. 1775 bytes in). The subsequent password-retry Open() calls then fail with EndOfStreamException because the stream isn't rewound.

Fixed in 9444f82 by adding fileEntry.Content.Position = 0 before each retry — matching the pattern already used by RarExtractor and ZipExtractor.

@gfs
Copy link
Contributor

gfs commented Feb 6, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@gfs
Copy link
Contributor

gfs commented Feb 6, 2026

@copilot something I wonder about these new test samples for these formats - the fallback case of extact self on fail also would result in 1 file being extracted - I think maybe our expected num files tests should check to make sure that the status isn't set to failed archive. What do you think?

Three test methods (ExtractArchiveParallel, ExtractArchiveFromStreamAsync,
ExtractArchiveFromStream) only verified result count, which could mask
extraction failures for single-file archives (ARJ, ARC, ACE) since
ExtractSelfOnFail returns 1 result with FailedArchive status. Now all
three also assert that no result has FailedArchive status.

Co-authored-by: gfs <[email protected]>
Copy link
Contributor Author

Copilot AI commented Feb 6, 2026

@copilot something I wonder about these new test samples for these formats - the fallback case of extact self on fail also would result in 1 file being extracted - I think maybe our ex...

Great point — with ExtractSelfOnFail=true and 1-file archives, a broken extractor still returns 1 result, so the count assertion alone can't distinguish success from failure. Fixed in ac4537f by adding FailedArchive status assertions to the three test methods that previously only checked count: ExtractArchiveParallel, ExtractArchiveFromStreamAsync, and ExtractArchiveFromStream. The other methods (ExtractArchive, ExtractArchiveAsync, ExtractArchiveNoRecursion, ExtractArchiveParallelNoRecursion, ExtractArchiveSmallBatchSize) already had status verification.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

RecursiveExtractor/MiniMagic.cs:191

  • These 'if' statements can be combined.
            if (bytesRead >= 4)
            {
                if (buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
                {
                    return ArchiveFileType.ZIP;
                }
            }

RecursiveExtractor/MiniMagic.cs:211

  • These 'if' statements can be combined.
            if (bytesRead >= 7)
            {
                if (buffer[0] == 0x52 && buffer[1] == 0x61 && buffer[2] == 0x72 && buffer[3] == 0x21 && buffer[4] == 0x1A && buffer[5] == 0x07 && buffer[6] == 0x00)
                {
                    return ArchiveFileType.RAR;
                }
            }

Prevents breaking changes when adding new members. All pre-existing
members retain their original implicit values (UNKNOWN=0 through
INVALID=18). New members ARJ/ARC/ACE get values 19-21.

Co-authored-by: gfs <[email protected]>
@gfs gfs requested a review from Copilot February 10, 2026 20:27
@gfs
Copy link
Contributor

gfs commented Feb 10, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 14 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

RecursiveExtractor/MiniMagic.cs:191

  • These 'if' statements can be combined.
            if (bytesRead >= 4)
            {
                if (buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
                {
                    return ArchiveFileType.ZIP;
                }
            }

RecursiveExtractor/MiniMagic.cs:211

  • These 'if' statements can be combined.
            if (bytesRead >= 7)
            {
                if (buffer[0] == 0x52 && buffer[1] == 0x61 && buffer[2] == 0x72 && buffer[3] == 0x21 && buffer[4] == 0x1A && buffer[5] == 0x07 && buffer[6] == 0x00)
                {
                    return ArchiveFileType.RAR;
                }
            }

Copy link
Contributor

@gfs gfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these implementations are all pretty straightforward and seem to follow the normal pattern of the wrappers around sharp compress library extractors.

I think the nits I have are around the new archive samples, which were also generated by copilot to build the test cases. We also have no nested sample with them - thus not testing the recurse functionality for these extractors, and we have no samples with more than one file, the samples only contain a single file.

@copilot can you add a sample archive of two text files in a TAR archive in an ARJ archive in an ARC archive in an ACE archive in a ZIP archive? Test it with the test number of files test, and maybe also add a test case validating that the actual content of the two files are actually properly extracted, this would give us a single test that exercises all three of the newly added extractors without having to modify the deeply nested sample we have for the extant formats. And begin down the path of building slightly more robust samples that aren't just looking for the number of files. Add the test code for both async and sync methods (using same sample file).

{
try
{
fileEntry.Content.Position = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation: Though it would look unrelated to the other changes, In the migration to the newer sharpcompress this is required to reset the stream position to properly handle multiple password attempts correctly on sevenzip archives.

Create NestedFormatsTest.zip: ZIP → ACE → ARC → ARJ → TAR containing
two text files. Exercises all three new extractors in a single deeply
nested sample with multiple files.

Add NestedFormatsContentTests with sync and async tests that verify
both file count and actual extracted content (file1.txt + file2.txt).

Add entries to ArchiveData (2 files) and NoRecursionData (1 file)
datasets in ExpectedNumFilesTests.

Move INVALID enum member next to UNKNOWN for readability (value=18
preserved).

Co-authored-by: gfs <[email protected]>
Copy link
Contributor Author

Copilot AI commented Feb 12, 2026

I think these implementations are all pretty straightforward and seem to follow the normal pattern of the wrappers around sharp compress library extractors.

I think the nits I have are around the ne...

Added in 753c05f. Created NestedFormatsTest.zip with this nesting structure: ZIP → ACE → ARC → ARJ → TAR → file1.txt + file2.txt. This exercises all three new extractors recursively in a single sample with multiple files.

New NestedFormatsContentTests class validates actual extracted content (both sync and async):

  • Asserts 2 files extracted
  • Asserts no FailedArchive status
  • Asserts file1.txt and file2.txt contents match expected strings exactly

Also added entries to ArchiveData (expects 2 with recursion) and NoRecursionData (expects 1 without recursion) in ExpectedNumFilesTests.

Copilot AI requested a review from gfs February 12, 2026 05:21
@gfs gfs marked this pull request as ready for review February 12, 2026 05:22
@gfs gfs requested a review from Copilot February 12, 2026 05:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 16 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (2)

RecursiveExtractor/MiniMagic.cs:191

  • These 'if' statements can be combined.
            if (bytesRead >= 4)
            {
                if (buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
                {
                    return ArchiveFileType.ZIP;
                }
            }

RecursiveExtractor/MiniMagic.cs:211

  • These 'if' statements can be combined.
            if (bytesRead >= 7)
            {
                if (buffer[0] == 0x52 && buffer[1] == 0x61 && buffer[2] == 0x72 && buffer[3] == 0x21 && buffer[4] == 0x1A && buffer[5] == 0x07 && buffer[6] == 0x00)
                {
                    return ArchiveFileType.RAR;
                }
            }

@gfs
Copy link
Contributor

gfs commented Feb 12, 2026

@copilot add the new binary test archives formats to .gitattributes as binary

Copy link
Contributor Author

Copilot AI commented Feb 12, 2026

@copilot add the new binary test archives formats to .gitattributes as binary

Added in 03c5e33. *.arj, *.arc, and *.ace are now marked as binary in .gitattributes to prevent EOL normalization corruption.

@gfs
Copy link
Contributor

gfs commented Feb 12, 2026

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@gfs gfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally seems pretty straightforward to me on the implementation side. The new test case for checking test content is I think good to expand later with data rows, but this PR is already getting big so I think its better left for later.

INFO: The new test file fixtures were created by agent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ARJ decompression, provided by SharpCompress

2 participants