Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multiple file support #188

Merged
merged 1 commit into from
Jun 15, 2019

Conversation

dgzlopes
Copy link
Contributor

Signed-off-by: Daniel González Lopes [email protected]

Small fix for #180. Multiple file support might be interesting for the new benchmark script.

Copy link
Contributor

@domanchi domanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write test cases for this new feature.

@dgzlopes
Copy link
Contributor Author

dgzlopes commented May 29, 2019

@domanchi Added multiple valid files and multiple invalid files tests on the last update! Should I add test for some more edge case?

And thank you for your patience and taking some time to review the PR!

@dgzlopes
Copy link
Contributor Author

dgzlopes commented Jun 9, 2019

Learned a lot from the review! Both about Python and the problem. So I tried to fix the missing logic and all the small bits. Now for each element on the path, I check if It's a directory or a file:

  • If It's a directory:
    - If we have the scan_all_files flag active It runs _get_files_recursively(element).
    - If not It runs _get_git_tracked_files(element).
    - In both cases, It extends the returned list to the files_to_scan list.
  • If it's a file It's appended to the files_to_scan list.
  • If the element doesn't exist the error is logged!

Seems to work with only files, only dirs, a mix of files and dirs, mix of valid and invalid files/dirs and with scan_all_files flag too!

A little bit offtopic... but I would love to learn more about Pythonic code style and the Python way of doing things. I have read this resource [0] (Well, in reality, all the guide!) but felt a little short... So, do you know any other great resources (Talks, Books...Whatever) for learning more about this topic? (As the Types talk from my last PR! Was really helpful)

[0] https://docs.python-guide.org/writing/style/

Copy link
Contributor

@domanchi domanchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a lot better!

@dgzlopes, my favorite resource (actually recommended by @KevinHock) for making your code more Pythonic is: https://www.youtube.com/watch?v=OSGv2VnC0go

@@ -120,7 +120,7 @@ def add_arguments(self):
def _add_initialize_baseline_argument(self):
self.parser.add_argument(
'path',
nargs='?',
nargs='*',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but would it be possible to use '+'? So you wouldn't have to do

:type path: str|list

and the isinstance check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, as '+' generates an error message if there isn't at least one command-line argument present... I thought that as we are using default's, '*' would be a better fit. Running the test suite with '+' breaks with detect-secrets scan: error: too few arguments (Well, as intended!).

Maybe I'm missing something... But If you think it's an improvement I can take a closer look at it :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re completely right :)

From reading https://stackoverflow.com/questions/23490152/list-of-arguments-with-argparse#23490179 it seems it should be always be a list with *, so maybe that’s true?

Copy link
Contributor Author

@dgzlopes dgzlopes Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Looks like all is passed as a list, even the default argument '.'

So the isInstance() might be redundant as when we pass (on initialize()) path='.' it's never used... Running scan without path(s) will trigger argparse's default.

This was closely related to #188 (comment) so maybe using the list as a default isn't that great... But I think '+' passes all as a list too! What's the best way to handle this?

Copy link
Collaborator

@KevinHock KevinHock Jun 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand, the

    if not isinstance(path, list):
        path = [path]

is always dead-code, so we can remove that.

From reading the code, it seems a path is always passed in to initialize, so there doesn't seem to be a need for a default argument I think, right?

If so we can change

def initialize(
    plugins,
    exclude_files_regex=None,
    exclude_lines_regex=None,
    path='.',
    scan_all_files=False,
):

to

def initialize(
    path,
    plugins,
    exclude_files_regex=None,
    exclude_lines_regex=None,
    scan_all_files=False,
):

Copy link
Contributor Author

@dgzlopes dgzlopes Jun 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I thought :) Thanks for the writeup. Updated the PR and fixed the failing tests. I hope It looks better now!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome 👏

@MVrachev
Copy link

I believe it will be useful to add information about this feature in the README.
Maybe just a small example calling detect-secret on multiple files.

@dgzlopes
Copy link
Contributor Author

dgzlopes commented Jun 12, 2019

@MVrachev I thought that multiple file support was an obvious feature for this type of tool! But I added a little Unix-like doc to the readme explaining scan mode inputs as this might not be that obvious :)

And thank you both @domanchi and @KevinHock for the link to the talk!

@dgzlopes dgzlopes force-pushed the fix-180-multiple-file-support branch from aeed00e to 8fd3289 Compare June 12, 2019 20:28
Signed-off-by: Daniel González Lopes <[email protected]>

Improve multiple file support

Signed-off-by: Daniel González Lopes <[email protected]>

Fixing multiple files logic

Signed-off-by: dgzlopes <[email protected]>

Fixing baseline tests

Signed-off-by: dgzlopes <[email protected]>

Fix initialize and tests

Signed-off-by: dgzlopes <[email protected]>
@dgzlopes dgzlopes force-pushed the fix-180-multiple-file-support branch from 13c72c6 to 75cb635 Compare June 14, 2019 08:11
Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢✅

@KevinHock KevinHock merged commit 2e06c6d into Yelp:master Jun 15, 2019
KevinHock added a commit that referenced this pull request Jun 15, 2019
@dgzlopes dgzlopes deleted the fix-180-multiple-file-support branch June 15, 2019 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants