Add new field `extracted_to` to `CodebaseResource` #510

JonoYang · 2022-08-23T19:04:36Z

In #485, we have an issue where we get two DiscoveredPackages for the same package when we scan a pypi wheel using the scan_codebase pipeline. This is happening because we report a Package detected from the wheel itself, and then we create another Package from the extracted METADATA file from the wheel. A way to avoid this would be for scancode.io to know where archives were extracted to. This way, if we detect that an archive is a Package, then we can easily tag its extracted contents as being part of that package. Alternatively, if we detect that an extracted archive is a package itself, then we can easily tag the archive as part of the package.

The text was updated successfully, but these errors were encountered:

Signed-off-by: Jono Yang <[email protected]>

uzaxirr · 2023-01-19T05:10:53Z

Hey can i work on this?
Also can you please describe what the extracted_to field would look like? will it be a ForeignKey or what?

uzaxirr · 2023-01-28T04:39:44Z

@TG1999 ^^

pombredanne · 2023-02-02T21:18:10Z

You sure can work on this! You would need to get familiar on how extractcode works and how extraction works in SCanCode.io. It extracts files to a directory. Keep the extracted_to would be about keeping track of which an archive is extracted to.

pombredanne · 2024-06-28T08:41:47Z

Closed in favor of #827

JonoYang added a commit that referenced this issue Aug 24, 2022

Add extracted_from field to CodebaseResource #510

d517cd6

Signed-off-by: Jono Yang <[email protected]>

JonoYang added a commit that referenced this issue Aug 24, 2022

Relate extracted Resources #510

3814772

Signed-off-by: Jono Yang <[email protected]>

pombredanne closed this as completed Jun 28, 2024

pombredanne mentioned this issue Jun 28, 2024

Map archives when their extracted directory mapped/processed #827

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new field `extracted_to` to `CodebaseResource` #510

Add new field `extracted_to` to `CodebaseResource` #510

JonoYang commented Aug 23, 2022

uzaxirr commented Jan 19, 2023

uzaxirr commented Jan 28, 2023

pombredanne commented Feb 2, 2023

pombredanne commented Jun 28, 2024

Add new field extracted_to to CodebaseResource #510

Add new field extracted_to to CodebaseResource #510

Comments

JonoYang commented Aug 23, 2022

uzaxirr commented Jan 19, 2023

uzaxirr commented Jan 28, 2023

pombredanne commented Feb 2, 2023

pombredanne commented Jun 28, 2024

Add new field `extracted_to` to `CodebaseResource` #510

Add new field `extracted_to` to `CodebaseResource` #510