Skip to content

Add ecosystem specific inclusions or exclusions #1550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

AyanSinhaMahapatra
Copy link
Member

Also ignore specific files paths containing metadata in ruby gems.

Reference: #1438
Reference: #1476

@AyanSinhaMahapatra AyanSinhaMahapatra marked this pull request as draft January 20, 2025 12:43
Also ignore specific files paths containing metadata in ruby
gems.

Reference: #1438
Reference: #1476
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking fine, the name "config.py" is too generic though for now.

Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
@AyanSinhaMahapatra AyanSinhaMahapatra marked this pull request as ready for review February 14, 2025 21:28
@AyanSinhaMahapatra AyanSinhaMahapatra marked this pull request as draft February 14, 2025 21:39
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
@AyanSinhaMahapatra AyanSinhaMahapatra force-pushed the exclusion-framework-ruby branch from 20e0ac6 to 32e1543 Compare March 17, 2025 07:22
@AyanSinhaMahapatra AyanSinhaMahapatra marked this pull request as ready for review March 17, 2025 07:26
@AyanSinhaMahapatra
Copy link
Member Author

@tdruez fixed the tests, this is ready for your review. 😄

Copy link
Contributor

@tdruez tdruez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See some suggestions about refining how we declare the configs.
Now, I'm struggling to understand the way we load and then use the config values on the pipeline. The implementation is probably a bit too complex and we should look into ways to simplify this so it's not too hard to maintain in the future.

Comment on lines +135 to +138
@optional_step("Ruby")
def load_ecosystem_config_ruby(self):
"""Load Ruby specific configurations for d2d steps."""
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a leftover or are we planning to duplicate those methods?

Comment on lines +70 to +77
ECOSYSTEM_CONFIGS = [
d2d_config.DefaultEcosystemConfig,
d2d_config.JavaEcosystemConfig,
d2d_config.JavaScriptEcosystemConfig,
d2d_config.RubyEcosystemConfig,
d2d_config.RustEcosystemConfig,
d2d_config.GoEcosystemConfig,
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this exist directly in the d2d_config module instead?

# Visit https://github.com/aboutcode-org/scancode.io for support and download.


class EcosystemConfig:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we benefit from using a @DataClass here for EcosystemConfig?

For example:

from dataclasses import dataclass, field

@dataclass
class EcosystemConfig:
    """
    Base class for ecosystem-specific configurations to be defined
    for each ecosystem.
    """

    # This should be defined for each ecosystem which
    # are options in the pipelines
    ecosystem_option: str = "Default"

    # These are extensions for packages of this ecosystem which
    # need to be matched from purldb
    purldb_package_extensions: list = field(default_factory=list)

    # These are extensions for resources of this ecosystem which
    # need to be matched from purldb
    purldb_resource_extensions: list = field(default_factory=list)

    # Extensions for document files which do not require review
    doc_extensions: list = field(default_factory=list)

    # Paths in the deployed binaries/archives (on the to/ side) which
    # do not need review even if they are not matched to the source side
    deployed_resource_path_exclusions: list = field(default_factory=list)

    # Paths in the development/source archive (on the from/ side) which
    # should not be considered even if unmapped to the deployed side when
    # assessing what to review on the deployed side
    devel_resource_path_exclusions: list = field(default_factory=list)

    # Symbols which are found in ecosystem-specific standard libraries
    # which are not so useful in mapping
    standard_symbols_to_exclude: list = field(default_factory=list)


# Dictionary of ecosystem configurations
ECOSYSTEM_CONFIGS = {
    "Default": EcosystemConfig(
        purldb_package_extensions=[".zip", ".tar.gz", ".tar.xz"],
        devel_resource_path_exclusions=["*/tests/*"],
        doc_extensions=[
            ".pdf",
            ".doc",
            ".docx",
            ".ppt",
            ".pptx",
            ".tex",
            ".odt",
            ".odp",
        ],
    ),
    "Java": EcosystemConfig(
        ecosystem_option="Java",
        purldb_package_extensions=[".jar", ".war"],
        purldb_resource_extensions=[".class"],
    ),
    "JavaScript": EcosystemConfig(
        ecosystem_option="JavaScript",
        purldb_resource_extensions=[
            ".map",
            ".js",
            ".mjs",
            ".ts",
            ".d.ts",
            ".jsx",
            ".tsx",
            ".css",
            ".scss",
            ".less",
            ".sass",
            ".soy",
        ],
    ),
    "Go": EcosystemConfig(
        ecosystem_option="Go",
        purldb_resource_extensions=[".go"],
    ),
    "Rust": EcosystemConfig(
        ecosystem_option="Rust",
        purldb_resource_extensions=[".rs"],
    ),
    "Ruby": EcosystemConfig(
        ecosystem_option="Ruby",
        purldb_package_extensions=[".gem"],
        purldb_resource_extensions=[".rb"],
        deployed_resource_path_exclusions=["*checksums.yaml.gz*", "*metadata.gz*"],
    ),
}

def get_ecosystem_config(ecosystem):
    """Return the ``ecosystem`` config."""
    return ECOSYSTEM_CONFIGS.get(ecosystem, ECOSYSTEM_CONFIGS["Default"])



class DeployToDevelop(Pipeline):
class DeployToDevelop(Pipeline, DefaultEcosystemConfig):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make this work without the need for the extra mixing. See further suggestions.

Comment on lines +133 to +135
configs_by_ecosystem = {
ecosystem.ecosystem_option: ecosystem for ecosystem in ECOSYSTEM_CONFIGS
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be replaced by a def get_ecosystem_config(ecosystem) imported from the d2d_config module. See the implementation suggestion below.

Comment on lines +130 to +131
Add ecosystem specific configurations for each ecosystem selected
as `options` to the `pipeline`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to provide more details about what's actually happening when "adding" a config.

)


def add_ecosystem_config(pipeline, configs_by_ecosystem, selected_option):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a detailed docstring.

else:
new_config_value = pipeline_config_value.extend(config_value)

setattr(pipeline, pipeline_config, new_config_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not ideal to set values from all the way down here, shouldn't we return those to a higher location that will explicitly set the values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants