Issue #19548: Fix IllegalTokenText in google_checks.xml to detect 2-d… by vivek-0509 · Pull Request #19549 · checkstyle/checkstyle

vivek-0509 · 2026-03-31T07:26:02Z

Issue #19548:

The regex only detected 3-digit octal forms (e.g. \012) but missed equivalent 2-digit forms (e.g. \12) for characters with special escape sequences per Google Java Style Guide §2.3.2, and JLS §3.10.7

Fix: changed 0(10|11|...) to 0?(10|11|...) making the leading zero optional. Added integration test coverage for all 8 two-digit octal escapes.

Diff Regression config: https://gist.githubusercontent.com/vivek-0509/26f7dc0977fa0ecaf66e127eb4245194/raw/171fc84ff7037cad652da44dbd135f994a03e163/base_config.xml

Diff Regression patch config: https://gist.githubusercontent.com/vivek-0509/4076379193a16479693fc871ec0f1cd8/raw/e0cc1597cde728943ee64faec406b5310e512628/patch_config.xml

…detect 2-digit octal escapes

vivek-0509 · 2026-03-31T07:31:35Z

Github, generate report

github-actions · 2026-03-31T08:43:13Z

https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/b2b6a47_2026084240/reports/diff/index.html

vivek-0509 · 2026-04-05T21:46:44Z

@romani ping

vivek-0509 · 2026-04-10T18:38:49Z

@romani ping

stoyanK7

Issue and PR look solid 👍

One small comment:

stoyanK7 · 2026-04-12T17:13:27Z

+    String r5 = "\\15"; // violation 'Consider using special escape sequence'
+    String r6 = "\\40"; // violation 'Consider using special escape sequence'
+    String r7 = "\\42"; // violation 'Consider using special escape sequence'
+    String r8 = "\\47"; // violation 'Consider using special escape sequence'


Probably a separate issue, but should sequences starting with \\ actually be flagged?

For example, "\\47" is just a literal backslash \ followed by 47, not the octal escape \47, since the backslash itself is escaped.

So I’d expect cases like "\\47" (and "\\11") not to be a violation.

Please let me know if I’m misunderstanding how this escaping thing works.

romani · 2026-04-12T17:22:35Z

This is false positive
https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/b2b6a47_2026084240/reports/diff/guava/index.html#A1
It is windows so path. Please add it to our inputs.

please add to inputs:
https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/b2b6a47_2026084240/reports/diff/lombok-ast/index.html#A2

it might be different issue , but in way we do violation I do not understand what is wrong here
https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/b2b6a47_2026084240/reports/diff/openjdk25/index.html#A1 , please add it inputs and make copy of such string with fixed values.

add to our inputs this https://checkstyle-diff-reports.s3.us-east-2.amazonaws.com/b2b6a47_2026084240/reports/diff/openjdk25/index.html#A126 , I am not sure what is expected form of code here, please add fixed copy of this line, to clearly point on part that need fix.

vivek-0509 · 2026-04-12T17:59:20Z

@stoyanK7 @romani
i think this is a fundamental limitation of how IllegalTokenText works. The check uses find() to match the regex against raw token text as a substring, without any understanding of escape sequence boundaries. It simply scans the text for any substring that matches the pattern, regardless of whether that substring is the actual complete escape sequence in the source code.

In the openjdk case, \100 is a single 3-digit octal escape, the lexer consumes all three digits greedily. But find() sees the characters \, 1, 0, 0 and matches \10 as a substring, treating it as a 2-digit octal for backspace. It has no concept that \100 is one indivisible escape sequence. In the Guava case, \\11 in source is an escaped backslash followed by 11 as part of a Windows path date, the lexer consumes \\ as a pair, leaving 11 as literal characters. But find() matches \11 starting at the second backslash, not knowing the two backslashes were already paired by the lexer.

The \100 type of false positive can be fixed with a negative lookahead in the regex, if the 2-digit match is followed by another octal digit, it is part of a 3-digit octal and should be skipped, this would correctly allows \108 to still be flagged, since 8 is not an octal digit and the lexer would parse it as \10 (a genuine 2-digit octal) followed by a literal 8. However, the \\11 case cannot be cleanly fixed through regex alone. A lookbehind (?<!\\) would help simple cases like \\11, but regex fundamentally cannot count backslash pairs to determine whether a backslash starts a new escape sequence or is itself escaped.

I can add the lookahead to fix the \100-type false positives, and will add the patterns from the diff reports to the test inputs, and we can open a separate issue for the \\ escape boundary limitation since it affects both 2-digit and 3-digit patterns. What is your take on this

stoyanK7 · 2026-04-16T14:34:41Z

Thanks for the detailed response @vivek-0509

The \100 type of false positive can be fixed with a negative lookahead in the regex, if the 2-digit match is followed by another octal digit, it is part of a 3-digit octal and should be skipped, this would correctly allows \108 to still be flagged, since 8 is not an octal digit and the lexer would parse it as \10 (a genuine 2-digit octal) followed by a literal 8.

Is it too much of a change to fix this scenario in this PR? False positives can be problem for users so we should avoid introducing them. I'm in favor of fixing that here.

However, the \11 case cannot be cleanly fixed through regex alone. A lookbehind (?<!\) would help simple cases like \11, but regex fundamentally cannot count backslash pairs to determine whether a backslash starts a new escape sequence or is itself escaped.

Even if it wouldn't eliminate all false positives, it should eliminate most, right? I'm in favor of adding a lookbehind. But in a separate issue/PR because this false positive has been present in the check before this PR. We are not introducing it.

vivek-0509 · 2026-04-16T18:59:08Z

@stoyanK7

Sounds good, I agree with both points.
I'll update this PR to add the negative lookahead to prevent false positives like \100. I'll also add the real world patterns from the diff reports to the test inputs as roman requested earlier. Yes it will eliminate most of the cases, I'll open a separate issue for the lookbehind.

Issue checkstyle#19548: Fix IllegalTokenText in google_checks.xml to …

b2b6a47

…detect 2-digit octal escapes

stoyanK7 approved these changes Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue #19548: Fix IllegalTokenText in google_checks.xml to detect 2-d…#19549

Issue #19548: Fix IllegalTokenText in google_checks.xml to detect 2-d…#19549
vivek-0509 wants to merge 1 commit intocheckstyle:masterfrom
vivek-0509:fix-google-checks-2digit-octal-escape-fn

vivek-0509 commented Mar 31, 2026 •

edited

Loading

Uh oh!

vivek-0509 commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

vivek-0509 commented Apr 5, 2026

Uh oh!

vivek-0509 commented Apr 10, 2026

Uh oh!

stoyanK7 left a comment

Uh oh!

stoyanK7 Apr 12, 2026

Uh oh!

romani commented Apr 12, 2026 •

edited

Loading

Uh oh!

vivek-0509 commented Apr 12, 2026 •

edited

Loading

Uh oh!

stoyanK7 commented Apr 16, 2026

Uh oh!

vivek-0509 commented Apr 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

vivek-0509 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vivek-0509 commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

vivek-0509 commented Apr 5, 2026

Uh oh!

vivek-0509 commented Apr 10, 2026

Uh oh!

stoyanK7 left a comment

Choose a reason for hiding this comment

Uh oh!

stoyanK7 Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

romani commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vivek-0509 commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stoyanK7 commented Apr 16, 2026

Uh oh!

vivek-0509 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vivek-0509 commented Mar 31, 2026 •

edited

Loading

romani commented Apr 12, 2026 •

edited

Loading

vivek-0509 commented Apr 12, 2026 •

edited

Loading

vivek-0509 commented Apr 16, 2026 •

edited

Loading