-
Notifications
You must be signed in to change notification settings - Fork 35
Add ARJ, ARC, and ACE archive decompression support #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… clone requirement Co-authored-by: gfs <[email protected]>
Co-authored-by: gfs <[email protected]>
Co-authored-by: gfs <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds native extraction support for ARJ, ARC, and ACE archives (via SharpCompress) to RecursiveExtractor, including MiniMagic type detection and test coverage to validate detection + extraction output counts.
Changes:
- Upgraded SharpCompress to enable ARJ/ARC/ACE reader support.
- Added MiniMagic signatures and new
ArchiveFileTypeenum values for ARJ/ARC/ACE, plus registered new extractors. - Added small fixture archives and expanded unit tests to cover detection and extracted-file counts.
Reviewed changes
Copilot reviewed 10 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| RecursiveExtractor/RecursiveExtractor.csproj | Bumps SharpCompress dependency to a version that supports ARJ/ARC/ACE. |
| RecursiveExtractor/MiniMagic.cs | Adds ARJ/ARC/ACE types and signature detection; increases header read size. |
| RecursiveExtractor/Extractors/ArjExtractor.cs | New ARJ extractor implementation (sync + async). |
| RecursiveExtractor/Extractors/ArcExtractor.cs | New ARC extractor implementation (sync + async), with post-decompression size check. |
| RecursiveExtractor/Extractors/AceExtractor.cs | New ACE extractor implementation (sync + async). |
| RecursiveExtractor/Extractor.cs | Registers ARJ/ARC/ACE extractors in SetDefaultExtractors(). |
| RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.arj | Adds ARJ test fixture archive. |
| RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.arc | Adds ARC test fixture archive. |
| RecursiveExtractor.Tests/TestData/TestDataArchives/TestData.ace | Adds ACE test fixture archive. |
| RecursiveExtractor.Tests/RecursiveExtractor.Tests.csproj | Ensures new fixture archives are copied to test output. |
| RecursiveExtractor.Tests/ExtractorTests/MiniMagicTests.cs | Adds MiniMagic detection tests for ARJ/ARC/ACE fixtures. |
| RecursiveExtractor.Tests/ExtractorTests/ExpectedNumFilesTests.cs | Adds extraction count expectations for ARJ/ARC/ACE fixtures. |
| .github/copilot-instructions.md | Documents deep-clone requirement for Nerdbank.GitVersioning builds. |
|
@copilot it doesn't intuitive eem like it should be related - but with this change i'm seeing the tests for |
…e each signature check on bytes actually read Co-authored-by: gfs <[email protected]>
SharpCompress 0.44.5 now throws CryptographicException at Open() time for encrypted 7z archives, leaving the stream at a non-zero position. Subsequent password retry calls to Open() then fail with EndOfStreamException. Reset fileEntry.Content.Position = 0 before each retry, matching the pattern already used by RarExtractor and ZipExtractor. Co-authored-by: gfs <[email protected]>
Good catch — it's not MiniMagic but a side effect of the SharpCompress 0.44.5 upgrade. In the old version, Fixed in 9444f82 by adding |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@copilot something I wonder about these new test samples for these formats - the fallback case of extact self on fail also would result in 1 file being extracted - I think maybe our expected num files tests should check to make sure that the status isn't set to failed archive. What do you think? |
Three test methods (ExtractArchiveParallel, ExtractArchiveFromStreamAsync, ExtractArchiveFromStream) only verified result count, which could mask extraction failures for single-file archives (ARJ, ARC, ACE) since ExtractSelfOnFail returns 1 result with FailedArchive status. Now all three also assert that no result has FailedArchive status. Co-authored-by: gfs <[email protected]>
Great point — with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 14 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
RecursiveExtractor/MiniMagic.cs:191
- These 'if' statements can be combined.
if (bytesRead >= 4)
{
if (buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
{
return ArchiveFileType.ZIP;
}
}
RecursiveExtractor/MiniMagic.cs:211
- These 'if' statements can be combined.
if (bytesRead >= 7)
{
if (buffer[0] == 0x52 && buffer[1] == 0x61 && buffer[2] == 0x72 && buffer[3] == 0x21 && buffer[4] == 0x1A && buffer[5] == 0x07 && buffer[6] == 0x00)
{
return ArchiveFileType.RAR;
}
}
Prevents breaking changes when adding new members. All pre-existing members retain their original implicit values (UNKNOWN=0 through INVALID=18). New members ARJ/ARC/ACE get values 19-21. Co-authored-by: gfs <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 11 out of 14 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (2)
RecursiveExtractor/MiniMagic.cs:191
- These 'if' statements can be combined.
if (bytesRead >= 4)
{
if (buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
{
return ArchiveFileType.ZIP;
}
}
RecursiveExtractor/MiniMagic.cs:211
- These 'if' statements can be combined.
if (bytesRead >= 7)
{
if (buffer[0] == 0x52 && buffer[1] == 0x61 && buffer[2] == 0x72 && buffer[3] == 0x21 && buffer[4] == 0x1A && buffer[5] == 0x07 && buffer[6] == 0x00)
{
return ArchiveFileType.RAR;
}
}
gfs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these implementations are all pretty straightforward and seem to follow the normal pattern of the wrappers around sharp compress library extractors.
I think the nits I have are around the new archive samples, which were also generated by copilot to build the test cases. We also have no nested sample with them - thus not testing the recurse functionality for these extractors, and we have no samples with more than one file, the samples only contain a single file.
@copilot can you add a sample archive of two text files in a TAR archive in an ARJ archive in an ARC archive in an ACE archive in a ZIP archive? Test it with the test number of files test, and maybe also add a test case validating that the actual content of the two files are actually properly extracted, this would give us a single test that exercises all three of the newly added extractors without having to modify the deeply nested sample we have for the extant formats. And begin down the path of building slightly more robust samples that aren't just looking for the number of files. Add the test code for both async and sync methods (using same sample file).
| { | ||
| try | ||
| { | ||
| fileEntry.Content.Position = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explanation: Though it would look unrelated to the other changes, In the migration to the newer sharpcompress this is required to reset the stream position to properly handle multiple password attempts correctly on sevenzip archives.
Create NestedFormatsTest.zip: ZIP → ACE → ARC → ARJ → TAR containing two text files. Exercises all three new extractors in a single deeply nested sample with multiple files. Add NestedFormatsContentTests with sync and async tests that verify both file count and actual extracted content (file1.txt + file2.txt). Add entries to ArchiveData (2 files) and NoRecursionData (1 file) datasets in ExpectedNumFilesTests. Move INVALID enum member next to UNKNOWN for readability (value=18 preserved). Co-authored-by: gfs <[email protected]>
Added in 753c05f. Created New
Also added entries to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 12 out of 16 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (2)
RecursiveExtractor/MiniMagic.cs:191
- These 'if' statements can be combined.
if (bytesRead >= 4)
{
if (buffer[0] == 0x50 && buffer[1] == 0x4B && buffer[2] == 0x03 && buffer[3] == 0x04)
{
return ArchiveFileType.ZIP;
}
}
RecursiveExtractor/MiniMagic.cs:211
- These 'if' statements can be combined.
if (bytesRead >= 7)
{
if (buffer[0] == 0x52 && buffer[1] == 0x61 && buffer[2] == 0x72 && buffer[3] == 0x21 && buffer[4] == 0x1A && buffer[5] == 0x07 && buffer[6] == 0x00)
{
return ArchiveFileType.RAR;
}
}
|
@copilot add the new binary test archives formats to .gitattributes as binary |
Co-authored-by: gfs <[email protected]>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
gfs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally seems pretty straightforward to me on the implementation side. The new test case for checking test content is I think good to expand later with data rows, but this PR is already getting big so I think its better left for later.
INFO: The new test file fixtures were created by agent.
Fixes #180 #192.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.