Fix Module::Metadata test suite compatibility by fglock · Pull Request #368 · fglock/PerlOnJava

fglock · 2026-03-24T14:23:12Z

Summary

This PR fixes PerlOnJava to pass most Module::Metadata tests and adds /a modifier support.

Changes

version.pm (Version.java):
- Fix qv() to correctly set is_qv=true and return v-prefixed versions
- Handle VSTRING type inputs correctly (bare v-strings like v2.2.102.2)
- Reject invalid version formats (multiple underscores, non-numeric data)
RegexFlags.java:
- Add isAscii field to track the /a modifier (ASCII-restrict)
- Only enable UNICODE_CHARACTER_CLASS when /u is explicitly used
RegexPreprocessorHelper.java:
- In ASCII mode (/a), expand \w, \W, \d, \D, \s, \S to ASCII-only equivalents
- Pass RegexFlags through to handle scoped modifiers like (?a:...)
RuntimeRegex.java:
- Add patternUnicode field for storing Unicode-enabled pattern variant
- Compile Unicode pattern when /u is explicitly used
- Select appropriate pattern at match time based on /u flag
GlobalContext.java:
- Set ${^TAINT} to 1 when -T flag is used
Config.pm:
- Add -DSILENT_NO_TAINT_SUPPORT to ccflags to properly skip taint tests

Test Results

re/charset.t: 5290 passing (improved from baseline 5282)
re/pat.t: 1065 passing (baseline)
uni/variables.t: 66880 passing (baseline)
re/regex_sets.t: 76 passing (baseline)
Module::Metadata: 380/383 tests pass (3 encoding.t failures for Unicode package names)

Known Limitation

The 3 failing Module::Metadata tests are for Unicode package names (e.g., Cœur). In Perl, \w matches Unicode when the string has the UTF-8 flag. PerlOnJava doesn't track this per-string flag, so we require explicit /u modifier for Unicode matching. This is a trade-off to maintain compatibility with the core regex test suite.

Test Plan

make - all unit tests pass
Core regex tests pass without regressions

Generated with Devin

- version.pm: Fix qv() to set is_qv=true and return v-prefixed versions - version.pm: Handle VSTRING type inputs correctly (bare v-strings) - version.pm: Reject invalid formats (multiple underscores, non-numeric data) - RegexFlags: Enable UNICODE_CHARACTER_CLASS by default (matches Perl 5.14+) This fixes Unicode package name recognition in Module::Metadata - GlobalContext: Set ${^TAINT} to 1 when -T flag is used - Config.pm: Add -DSILENT_NO_TAINT_SUPPORT to ccflags to skip taint tests All 383 Module::Metadata tests now pass. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

- Add isAscii field to RegexFlags record to track /a modifier - Update fromModifiers(), with(), toFlagString() to handle /a flag - Modify toPatternFlags() to only enable UNICODE_CHARACTER_CLASS when /u is explicitly used and /a is not present - Update handleEscapeSequences to accept RegexFlags and: - In ASCII mode: expand \w to [a-zA-Z0-9_], \W to [^a-zA-Z0-9_] - In ASCII mode: expand \d to [0-9], \D to [^0-9] - In ASCII mode: expand \s/\S to ASCII-only whitespace - In non-ASCII mode: keep Unicode whitespace expansion for \s/\S This fixes regressions in re/charset.t, re/pat.t, uni/variables.t and re/regex_sets.t tests that were caused by enabling UNICODE_CHARACTER_CLASS by default. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

- Add patternUnicode field to RuntimeRegex for storing Unicode-enabled pattern - When /u is explicitly used (and not /a), compile pattern with UNICODE_CHARACTER_CLASS - Select appropriate pattern at match time based on /u flag - Copy patternUnicode when regex objects are cloned/copied This prepares infrastructure for proper Unicode matching but keeps backward compatibility (ASCII-only by default). Currently only activated when /u is used. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Implement auto-Unicode detection for regex matching to support Unicode package names like 'Cœur' without requiring the explicit /u modifier. Changes: - Always compile Unicode pattern variant (unless /a flag is used) - Add hasNonAscii() helper method to detect non-ASCII input strings - Select Unicode pattern automatically when input contains non-ASCII - Skip Unicode check for BYTE_STRING type (optimization) This fixes Module::Metadata tests that use \w to match Unicode characters in package names. Now all 383 Module::Metadata tests pass. Benchmark shows minimal overhead (~1% from 2.25 to 2.28 CPU seconds). Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

- Add Utf8.isUtf8(RuntimeScalar) helper method for internal use - Use combined check: UTF-8 flag AND non-ASCII content for Unicode matching - This ensures BYTE_STRING always uses ASCII matching - UTF-8 strings with only ASCII content also use ASCII (optimization) The op/utfhash.t regressions are due to a pre-existing bug where hash keys lose their BYTE_STRING type when stored/retrieved from hashes. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Document how to use the test runner tool for running Perl test suites. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

- Add Unicode pattern variant (UNICODE_CHARACTER_CLASS) to RuntimeRegex - Select Unicode pattern when string is UTF-8 and contains chars > 255 - Fix \x{100}+ escapes to produce UTF-8 strings (not BYTE_STRING) - Add wide character check in EmitLiteral.java (JVM backend) - Add wide character check in BytecodeCompiler.java (interpreter) Test improvements: - Module::Metadata: 383/383 PASS (Cœur package name now works) - op/utfhash.t: 91/99 (regression fixed) - re/charset.t: +180 tests passing - re/regexp.t: +15 tests passing - re/regexp_qr_embed.t: +18 tests passing Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Performance fix: hasUnicodeChars() was scanning the entire input string on every regex match to check for Unicode characters (> 255). For a 100K character string with 100K matches, this was 10 billion char checks! Solution: Cache the hasUnicodeChars result in RuntimePosLvalue alongside the existing pos() cache. The cache entry is invalidated when the string value changes (detected via hashCode). Also optimized pos() lookup to only occur for /g global matches. Results: - 100K matches on 100K string: 6s -> 0s (1000x faster) - 1M global matches: now completes in <1s - re/pat_psycho.t: 0/15 (timeout) -> 15/15 pass - re/speed.t: 0/59 (timeout) -> 25+ pass Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Add lexical hints HINT_RE_ASCII and HINT_RE_UNICODE to control default regex character class semantics: - use re '/a': ASCII-restrict \w, \d, \s to match only ASCII - use re '/u': Unicode semantics for \w, \d, \s - use re '/aa': Strict ASCII (also affects case folding) The pragma sets a hint at compile time, which is then applied to all regex patterns compiled in that scope (m//, qr//, s///). This fixes a regression where 'use re "/a"' was not being respected, causing \x{100} to incorrectly match \w. Test results: - re/pat.t: 1066/1298 (+1 from baseline 1065) - Module::Metadata: 383/383 pass Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

Document implemented and unimplemented features in Re.java class javadoc to help future development and track what's still needed. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock and others added 8 commits March 24, 2026 15:22

Add perl_test_runner.pl documentation to AGENTS.md

4b6c95c

Document how to use the test runner tool for running Perl test suites. Generated with [Devin](https://cli.devin.ai/docs) Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>

fglock force-pushed the fix/module-metadata-tests branch from b180f79 to 204b2f3 Compare March 24, 2026 16:49

fglock and others added 2 commits March 24, 2026 18:04

fglock merged commit df43f01 into master Mar 24, 2026
2 checks passed

fglock deleted the fix/module-metadata-tests branch March 24, 2026 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Module::Metadata test suite compatibility#368

Fix Module::Metadata test suite compatibility#368
fglock merged 10 commits intomasterfrom
fix/module-metadata-tests

fglock commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fglock commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Results

Known Limitation

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fglock commented Mar 24, 2026 •

edited

Loading