Reduce false positives in DatadogToken detector#4632
Reduce false positives in DatadogToken detector#4632rootranjan wants to merge 1 commit intotrufflesecurity:mainfrom
Conversation
Add filters to exclude legitimate code patterns: - Letters-only matches (service names/identifiers) - Repeated characters (test/placeholder values) - NPM integrity hashes (sha512-...== patterns) - Go module checksums (h1:...= patterns) - URL-encoded paths (%3A patterns) - SOPS-encrypted data (ENC[AES256_GCM,data:...] patterns) - Base64-encoded certificates (caBundle patterns) This reduces false positives while still detecting real Datadog API and Application keys.
nabeelalam
left a comment
There was a problem hiding this comment.
Hey @rootranjan! Thanks for proposing a solution for the false positive issues in this detector.
I'm all for testing for entropy and whether the tokens contain all required characters, but I feel the rest of the checks may be adding some complexity and performance hits without a strong guarantee of detecting false positives.
Introducing 4 more regular expressions along with the slicing/matching definitely will definitely impact the performance on this detector, we generally like to keep detectors light since inputs can be large.
Plus, several filters seem a little to broad to me (e.g., %3A anywhere in the range of 200 chars) and could possibly suppress actual secrets and mark them false and it's better to be safe than sorry in this case. Plus with the added complexity it would be harder to debug those cases too.
I would suggest keeping changes minimal, like the lighter weight constraints (required chars, entropy threshold) that you have added and add a couple of high-confidence exclusions only.
| // isRepeatedCharacter checks if a string consists of the same character repeated. | ||
| // This filters out test/placeholder values like "11111111111111111111111111111111" | ||
| func isRepeatedCharacter(s string) bool { | ||
| if len(s) == 0 { | ||
| return false | ||
| } | ||
| firstChar := s[0] | ||
| for i := 1; i < len(s); i++ { | ||
| if s[i] != firstChar { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
We can use an entropy test for this
| func hasDigit(s string) bool { | ||
| for _, r := range s { | ||
| if unicode.IsDigit(r) { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
I suppose we should check whether the token contains at least one lower-case and upper-case letter as well. I'm not entirely sure but the entropy check might also solve this (I doubt it).
Fixes #4631
Description:
Reduce false positives in DatadogToken detector by filtering out legitimate code identifiers, checksums, encrypted data, and test values that match the detector pattern.
Changes:
sha512-...==patterns)h1:...=patterns)%3Apatterns)ENC[AES256_GCM,data:...]patterns)caBundlepatterns)res.Body.Close()errorsThis reduces false positives from legitimate code identifiers, checksums, encrypted data, and test values while still detecting real Datadog API and Application keys that contain digits and have higher entropy.
Problem:
The DatadogToken detector was flagging any 32-character or 40-character alphanumeric string near the keywords "datadog" or "dd" as a potential secret, including:
service%3Amy-app-service-name)sha512-...==patterns)h1:...=patterns)ENC[AES256_GCM,data:...]patterns)11111111111111111111111111111111)caBundlefields)Solution:
Added
isLikelyFalsePositive()helper function with multiple filters:11111111111111111111111111111111sha512-...==patterns in package.json filesh1:...=patterns in go.sum/go.mod files%3Apatterns and URL structuresENC[AES256_GCM,data:...]patternscaBundleand certificate-related fieldsImplementation Details:
FromData()to useFindAllStringSubmatchIndex()to get match positions for context extractionsha512-,h1:,%3A,ENC[,caBundle)Checklist:
make test-community)?make lintthis requires golangci-lint)?