Skip to content

Review and improve regex rules #159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
domanchi opened this issue Apr 10, 2019 · 3 comments
Open

Review and improve regex rules #159

domanchi opened this issue Apr 10, 2019 · 3 comments
Labels
enhancement The issue is related to improving a certain aspect of the project. false positives triaged The issue has been reviewed but has not been solved yet.

Comments

@domanchi
Copy link
Contributor

There was a recent white paper released (summary, source).

What's most interesting is on page 15, they list a variety of explicit regexes that we may be able to incorporate into our scanning. I think we already cover like 80% (mostly with the high entropy scanner), but there are some interesting ones to extract from that. e.g.:

  • finance related tokens
  • Facebook access tokens

We should go through this list and create new plugins for the ones that we're missing.

@killuazhu
Copy link
Contributor

I love the idea. Be able to more deterministically identify the type of the token can also support #153

@domanchi
Copy link
Contributor Author

A couple of notes from this paper worth mentioning (for posterity):

  • Section III, Part E: talks about some interesting ideas on how to better filter out junk keys (e.g. XXXX, has EXAMPLE in the text)

  • Section V, Part D: notes that multi-factor secrets (e.g. username and password) has an 80% chance that they both can often be found within 5 lines of context, before and after a secret.

  • Section VII, Part D: entropy checks still catch more than just regex rules. This is good to know, and allows users to decide how conservative they want to be (accuracy v recall trade-off).

@KevinHock
Copy link
Collaborator

I thought this part was another cool thing to experiment with:

Section III, Part D:

Note that each regular expression was prefixed with negative lookbehind (?<![\w]) and suffixed with negative lookahead (?![\w]) to ensure that no word characters appeared before or after the regular expression match and improve accuracy.

@lorenzodb1 lorenzodb1 added pending The issue still needs to be reviewed by one of the maintainers. and removed enhancement labels Jun 13, 2022
@lorenzodb1 lorenzodb1 added false positives enhancement The issue is related to improving a certain aspect of the project. triaged The issue has been reviewed but has not been solved yet. and removed pending The issue still needs to be reviewed by one of the maintainers. labels May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement The issue is related to improving a certain aspect of the project. false positives triaged The issue has been reviewed but has not been solved yet.
Projects
None yet
Development

No branches or pull requests

4 participants