Skip to content

fix include functionality #2395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 24, 2021
Merged

Conversation

tardyp
Copy link
Contributor

@tardyp tardyp commented Feb 11, 2021

When using --include on files, the directories get excluded.
removing those ressource end up in removing also the
nested files

Not sure exactly which test suite I have to fix.
Would be glad to get some help on this.

Fixes #2059

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁

@pombredanne
Copy link
Member

@tardyp Thank you ++ for this! A test should be something to add in https://github.com/nexB/scancode-toolkit/blob/develop/tests/scancode/test_plugin_ignore.py

@pombredanne
Copy link
Member

@tardyp you can also see some of the failing tests here:

FAILED tests/scancode/test_plugin_ignore.py::TestPluginIgnoreFiles::test_ProcessIgnore_with_glob_for_path
FAILED tests/scancode/test_plugin_ignore.py::TestScanPluginIgnoreFiles::test_scancode_multiple_ignores
FAILED tests/scancode/test_plugin_ignore.py::TestScanPluginIgnoreFiles::test_scancode_codebase_attempt_to_access_an_ignored_resourced_cached_to_disk

And the trace details:

 ================================== FAILURES ===================================
 _________ TestPluginIgnoreFiles.test_ProcessIgnore_with_glob_for_path _________
 [gw0] win32 -- Python 3.6.8 d:\a\1\s\scripts\python.exe
 
 self = <test_plugin_ignore.TestPluginIgnoreFiles object at 0x000002636118E588>
 
     def test_ProcessIgnore_with_glob_for_path(self):
         test_dir = self.extract_test_tar('plugin_ignore/user.tgz')
         ignore = ('*/src/test',)
         expected = [
             'user',
             'user/ignore.doc',
             'user/src',
             'user/src/ignore.doc'
         ]
 >       self.check_ProcessIgnore(test_dir, expected, ignore)
 
 tests\scancode\test_plugin_ignore.py:101: 
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 
 self = <test_plugin_ignore.TestPluginIgnoreFiles object at 0x000002636118E588>
 test_dir = 'C:\\Users\\VSSADM~1\\AppData\\Local\\Temp\\scancode-tk-tests -0scjpe81\\ti66ji0i\\user.tgz'
 expected = ['user', 'user/ignore.doc', 'user/src', 'user/src/ignore.doc']
 ignore = ('*/src/test',)
 
     def check_ProcessIgnore(self, test_dir, expected, ignore):
         codebase = Codebase(test_dir, strip_root=True)
         test_plugin = ProcessIgnore()
         test_plugin.process_codebase(codebase, ignore=ignore)
         resources = [res.path for res in codebase.walk(skip_root=True)]
 >       assert expected == sorted(resources)
 E       AssertionError: assert ['user', 'user/ignore.doc', 'user/src', 'user/src/ignore.doc'] == ['user',\n 'user/ignore.doc',\n 'user/src',\n 'user/src/ignore.doc',\n 'user/sr
 E         Right contains 3 more items, first extra item: 'user/src/test'
 E         Full diff:
 E           [
 E            'user',
 E            'user/ignore.doc',
 E            'user/src',
 E            'user/src/ignore.doc',
 E         -  'user/src/test',
 E         -  'user/src/test/sample.doc',
 E         -  'user/src/test/sample.txt',
 E           ]
 
 tests\scancode\test_plugin_ignore.py:55: AssertionError
 __________ TestScanPluginIgnoreFiles.test_scancode_multiple_ignores ___________
 [gw0] win32 -- Python 3.6.8 d:\a\1\s\scripts\python.exe
 
 self = <test_plugin_ignore.TestScanPluginIgnoreFiles object at 0x0000026361B23B70>
 
     def test_scancode_multiple_ignores(self):
         test_dir = self.extract_test_tar('plugin_ignore/user.tgz')
         result_file = self.get_temp_file('json')
         args = ['--copyright', '--strip-root', '--ignore', '*/src/test', '--ignore', '*.doc', test_dir, '--json', result_file]
         run_scan_click(args)
         scan_result = load_json_result(result_file)
 >       assert 0 == scan_result['headers'][0]['extra_data']['files_count']
 E       assert 0 == 1
 E         +0
 E         -1
 
 tests\scancode\test_plugin_ignore.py:223: AssertionError
 _ TestScanPluginIgnoreFiles.test_scancode_codebase_attempt_to_access_an_ignored_resourced_cached_to_disk _
 [gw0] win32 -- Python 3.6.8 d:\a\1\s\scripts\python.exe
 
 self = <test_plugin_ignore.TestScanPluginIgnoreFiles object at 0x00000263619E4F60>
 
     def test_scancode_codebase_attempt_to_access_an_ignored_resourced_cached_to_disk(self):
         test_dir = self.extract_test_tar('plugin_ignore/user.tgz')
         result_file = self.get_temp_file('json')
         args = ['--copyright', '--strip-root', '--ignore', 'test', test_dir, '--max-in-memory', '1', '--json', result_file]
         run_scan_click(args)
         scan_result = load_json_result(result_file)
         assert 2 == scan_result['headers'][0]['extra_data']['files_count']
         scan_locs = [x['path'] for x in scan_result['files']]
         expected = [
             u'user',
             u'user/ignore.doc',
             u'user/src',
             u'user/src/ignore.doc',
         ]
 >       assert expected == scan_locs
 E       AssertionError: assert ['user', 'user/ignore.doc', 'user/src', 'user/src/ignore.doc'] == ['user', 'user/ignore.doc', 'user/src', 'user/src/ignore.doc', 'user/src/test']
 E         Right contains one more item: 'user/src/test'
 E         Full diff:
 E         - ['user', 'user/ignore.doc', 'user/src', 'user/src/ignore.doc', 'user/src/test']
 E         ?                                                              -----------------
 E         + ['user', 'user/ignore.doc', 'user/src', 'user/src/ignore.doc']
 
 tests\scancode\test_plugin_ignore.py:241: AssertionError

So this should give a good idea of where to look.
For this change, we likely would need also some doc as this is a breaking API change:

  1. a CHANGELOG entry
  2. some documenation at least in the --help doc and likely a bit more

When using --include on files, the directories get excluded.
removing those ressource end up in removing also the
nested files

The include exclude algorithm makes it quite difficult to support
ignoring a whole directory by just saying ignore=path/to/dir

In this version, we support that usecase with ignore=path/to/dir/*

Signed-off-by: Pierre Tardy <[email protected]>
@tardyp
Copy link
Contributor Author

tardyp commented Mar 19, 2021

hi @pombredanne Sorry it took so long to update this.

I updated the patch and fixed the tests.

I am not completely happy with the results as it this algorithm is not supporting the same features.

The include exclude algorithm makes it quite difficult to support ignoring a whole directory by just saying ignore=path/to/dir

In this version, we support that usecase with ignore=path/to/dir/*

Let me know if you find this acceptable, I'll add a changelog describing this api change.

@tardyp
Copy link
Contributor Author

tardyp commented Mar 22, 2021

@pombredanne kindly ping

Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks you++
Looking great!

@pombredanne
Copy link
Member

@tardyp FYI #2454 this is a possible follow up.

@pombredanne pombredanne merged commit 59851db into aboutcode-org:develop Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

--include option not working
2 participants