-
Notifications
You must be signed in to change notification settings - Fork 108
Add ecosystem specific inclusions or exclusions #1550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Also ignore specific files paths containing metadata in ruby gems. Reference: #1438 Reference: #1476 Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
99db098
to
88bc201
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking fine, the name "config.py" is too generic though for now.
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
d9afb6c
to
528be96
Compare
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
528be96
to
0530bbe
Compare
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
20e0ac6
to
32e1543
Compare
@tdruez fixed the tests, this is ready for your review. 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See some suggestions about refining how we declare the configs.
Now, I'm struggling to understand the way we load and then use the config values on the pipeline. The implementation is probably a bit too complex and we should look into ways to simplify this so it's not too hard to maintain in the future.
@optional_step("Ruby") | ||
def load_ecosystem_config_ruby(self): | ||
"""Load Ruby specific configurations for d2d steps.""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a leftover or are we planning to duplicate those methods?
ECOSYSTEM_CONFIGS = [ | ||
d2d_config.DefaultEcosystemConfig, | ||
d2d_config.JavaEcosystemConfig, | ||
d2d_config.JavaScriptEcosystemConfig, | ||
d2d_config.RubyEcosystemConfig, | ||
d2d_config.RustEcosystemConfig, | ||
d2d_config.GoEcosystemConfig, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this exist directly in the d2d_config module instead?
# Visit https://github.com/aboutcode-org/scancode.io for support and download. | ||
|
||
|
||
class EcosystemConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we benefit from using a @DataClass here for EcosystemConfig?
For example:
from dataclasses import dataclass, field
@dataclass
class EcosystemConfig:
"""
Base class for ecosystem-specific configurations to be defined
for each ecosystem.
"""
# This should be defined for each ecosystem which
# are options in the pipelines
ecosystem_option: str = "Default"
# These are extensions for packages of this ecosystem which
# need to be matched from purldb
purldb_package_extensions: list = field(default_factory=list)
# These are extensions for resources of this ecosystem which
# need to be matched from purldb
purldb_resource_extensions: list = field(default_factory=list)
# Extensions for document files which do not require review
doc_extensions: list = field(default_factory=list)
# Paths in the deployed binaries/archives (on the to/ side) which
# do not need review even if they are not matched to the source side
deployed_resource_path_exclusions: list = field(default_factory=list)
# Paths in the development/source archive (on the from/ side) which
# should not be considered even if unmapped to the deployed side when
# assessing what to review on the deployed side
devel_resource_path_exclusions: list = field(default_factory=list)
# Symbols which are found in ecosystem-specific standard libraries
# which are not so useful in mapping
standard_symbols_to_exclude: list = field(default_factory=list)
# Dictionary of ecosystem configurations
ECOSYSTEM_CONFIGS = {
"Default": EcosystemConfig(
purldb_package_extensions=[".zip", ".tar.gz", ".tar.xz"],
devel_resource_path_exclusions=["*/tests/*"],
doc_extensions=[
".pdf",
".doc",
".docx",
".ppt",
".pptx",
".tex",
".odt",
".odp",
],
),
"Java": EcosystemConfig(
ecosystem_option="Java",
purldb_package_extensions=[".jar", ".war"],
purldb_resource_extensions=[".class"],
),
"JavaScript": EcosystemConfig(
ecosystem_option="JavaScript",
purldb_resource_extensions=[
".map",
".js",
".mjs",
".ts",
".d.ts",
".jsx",
".tsx",
".css",
".scss",
".less",
".sass",
".soy",
],
),
"Go": EcosystemConfig(
ecosystem_option="Go",
purldb_resource_extensions=[".go"],
),
"Rust": EcosystemConfig(
ecosystem_option="Rust",
purldb_resource_extensions=[".rs"],
),
"Ruby": EcosystemConfig(
ecosystem_option="Ruby",
purldb_package_extensions=[".gem"],
purldb_resource_extensions=[".rb"],
deployed_resource_path_exclusions=["*checksums.yaml.gz*", "*metadata.gz*"],
),
}
def get_ecosystem_config(ecosystem):
"""Return the ``ecosystem`` config."""
return ECOSYSTEM_CONFIGS.get(ecosystem, ECOSYSTEM_CONFIGS["Default"])
|
||
|
||
class DeployToDevelop(Pipeline): | ||
class DeployToDevelop(Pipeline, DefaultEcosystemConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should make this work without the need for the extra mixing. See further suggestions.
configs_by_ecosystem = { | ||
ecosystem.ecosystem_option: ecosystem for ecosystem in ECOSYSTEM_CONFIGS | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be replaced by a def get_ecosystem_config(ecosystem)
imported from the d2d_config module. See the implementation suggestion below.
Add ecosystem specific configurations for each ecosystem selected | ||
as `options` to the `pipeline`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to provide more details about what's actually happening when "adding" a config.
) | ||
|
||
|
||
def add_ecosystem_config(pipeline, configs_by_ecosystem, selected_option): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a detailed docstring.
else: | ||
new_config_value = pipeline_config_value.extend(config_value) | ||
|
||
setattr(pipeline, pipeline_config, new_config_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not ideal to set values from all the way down here, shouldn't we return those to a higher location that will explicitly set the values?
Also ignore specific files paths containing metadata in ruby gems.
Reference: #1438
Reference: #1476