Skip to content

Detecting npm dependencies licenses, fetching remote data from the registry #2591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
IanMoroney opened this issue Jul 12, 2021 · 13 comments
Open
Labels
dependencies live-online-scan Anything that requires a live, online netwrokd access (and would not workd in an isolated network) new feature

Comments

@IanMoroney
Copy link

Description

Given the below dependencies and devdependencies, I'm expecting scancode to tell me the license types of these dependencies, but it's not reporting them.

  "dependencies": {
    "@azure/cosmos": "^3.11.0",
    "dotenv": "^9.0.0",
    "fastify": "^3.15.1",
    "fastify-swagger": "^4.7.0",
    "nodemon": "^2.0.7",
    "uuid": "^8.3.2"
  },
  "devDependencies": {
    "@types/jest": "^26.0.23",
    "@types/node": "^15.0.2",
    "@typescript-eslint/eslint-plugin": "^4.22.1",
    "@typescript-eslint/parser": "^4.22.1",
    "eslint": "^7.25.0",
    "eslint-config-airbnb-base": "^14.2.1",
    "eslint-config-prettier": "^6.15.0",
    "eslint-import-resolver-typescript": "^2.4.0",
    "eslint-plugin-import": "^2.22.1",
    "eslint-plugin-node": "^11.1.0",
    "eslint-plugin-prettier": "^3.4.0",
    "husky": "^6.0.0",
    "jest": "^26.6.3",
    "prettier": "^2.2.1",
    "supertest": "^6.1.3",
    "ts-jest": "^26.5.6",
    "ts-node": "^9.1.1",
    "typescript": "^4.2.4"
  },

In the example above, these are the actual licenses for the dependencies:

"@azure/cosmos": MIT,
"dotenv": BSD-2-Clause,
"fastify": MIT,
"fastify-swagger": MIT,
"nodemon": MIT,
"uuid": MIT

the package.json scan (or the scan of the project) when shown in scancode-workbench doesn't report these licenses.

In the json results itself, packages.dependencies.fastify-swagger as an example, has the below output:

        {
          "type": "npm",
          "namespace": null,
          "name": "fastify-swagger",
          "version": "4.7.0",
          "qualifiers": {},
          "subpath": null,
          "primary_language": "JavaScript",
          "description": null,
          "release_date": null,
          "parties": [],
          "keywords": [],
          "homepage_url": null,
          "download_url": "https://registry.npmjs.org/fastify-swagger/-/fastify-swagger-4.7.0.tgz",
          "size": null,
          "sha1": null,
          "md5": null,
          "sha256": null,
          "sha512": null,
          "bug_tracking_url": null,
          "code_view_url": null,
          "vcs_url": null,
          "copyright": null,
          "license_expression": null,
          "declared_license": null,
          "notice_text": null,
          "root_path": "project",
          "dependencies": [
            {
              "purl": "pkg:npm/fastify-plugin@%5E3.0.0",
              "requirement": "^3.0.0",
              "scope": "requires",
              "is_runtime": true,
              "is_optional": false,
              "is_resolved": true
            }
          ],
          "contains_source_code": null,
          "source_packages": [],
          "extra_data": {},
          "purl": "pkg:npm/[email protected]",
          "repository_homepage_url": "https://www.npmjs.com/package/fastify-swagger",
          "repository_download_url": "https://registry.npmjs.org/fastify-swagger/-/fastify-swagger-4.7.0.tgz",
          "api_data_url": "https://registry.npmjs.org/fastify-swagger/4.7.0"
        },

But, the npm page for fastify-swagger reports the license type as MIT:
https://www.npmjs.com/package/fastify-swagger
Additionally, the source also does:
https://github.com/fastify/fastify-swagger/blob/master/LICENSE

How To Reproduce

Tell us how to reproduce the issue.

git clone https://github.com/nexB/scancode-toolkit.git
cd scancode-toolkit 
docker build -t scancode-toolkit .
docker run -v $PWD/:/project scancode-toolkit -clpeui --json-pp /project/result.json /project
Setup plugins...
Collect file inventory...
Scan files for: info, licenses, copyrights, packages, emails, urls with 1 process(es)...
Scanning done.
Summary:        info, licenses, copyrights, packages, emails, urls with 1 process(es)
Errors count:   0
Scan Speed:     3.69 files/sec. 38.46 KB/sec.
Initial counts: 92 resource(s): 76 file(s) and 16 directorie(s) 
Final counts:   92 resource(s): 76 file(s) and 16 directorie(s) for 792.56 KB
Timings:
  scan_start: 2021-07-12T172409.314478
  scan_end:   2021-07-12T172431.737338
  setup_scan:licenses: 1.65s
  setup: 1.65s
  inventory: 0.13s
  scan: 20.61s
  output:json-pp: 0.57s
  output: 0.57s
  total: 23.00s
Removing temporary files...done.

System configuration

For bug reports, it really helps us to know:

  • What OS are you running on? (MacOS)
  • What version of scancode-toolkit was used to generate the scan file? 21.3.31
  • What installation method was used to install/run scancode? (pip/source download/other) docker
@IanMoroney IanMoroney added the bug label Jul 12, 2021
@pombredanne
Copy link
Member

@IanMoroney We should document what is performed by ScanCode... it does not fetch nor resolve your dependencies. It scans what you point it to.
To get all your deps and scan them, run something like a npm install first in your package directory to fetch things first.

That said, it could be mightily useful to fetch and resolve packages alright too!
Would you expect ScanCode to actually fetch all the deps and then scan them?

@IanMoroney
Copy link
Author

The only thing i'm actually interested in is understanding what licenses our dependencies have, for compliance purposes.
It seems to already have the fields for it, it's just not pulling them (assuming the code doesn't yet do this:

license_expression
declared_license

Example would be, our own code might have an MIT license, but our first party dependencies might have both MIT and GPL, so it would be really good to know that.

@mjherzog
Copy link
Member

If the license data is not present in the target codebase for the Scan, then this is not a job for ScanCode Toolkit (SCTK).

We have a newer project called ScanCode.io where you run a Pipeline that can include any steps that you want to run pre- or post-Scan - see https://scancodeio.readthedocs.io/en/latest/. We have some sandbox tools separate from SCTK to fetch and scan package dependencies that we will release soon and they should fit nicely into a ScanCode.io Pipeline.

@pombredanne
Copy link
Member

As a good example of what is achievable in ScanCode.io see this work in progress https://github.com/nexB/scancode.io/issues/191by @aalexanderr and @quepop which is for Alpine packages

@pombredanne
Copy link
Member

@IanMoroney Are you OK if we move this issue to ScanCode.io ?

@pombredanne
Copy link
Member

Actually we will also add some support here.

@pombredanne pombredanne added dependencies live-online-scan Anything that requires a live, online netwrokd access (and would not workd in an isolated network) new feature and removed bug labels Feb 2, 2022
@pombredanne pombredanne changed the title scancode-toolkit not detecting npm dependency licenses Detecting npm dependencies licenses, fetching remote data from the registry Feb 2, 2022
@emmahsax
Copy link

@pombredanne

I am running into the same exactly issue as described here:

docker run -v $PWD:/project scancode-toolkit -clpui --processes 5 --verbose --json-pp /project/result.json /project/yarn.lock

results in empty license data. Why does this not work as expected? What is the fix?

I would rather not use scancode.io.... I just want a parseable json file.

@pombredanne
Copy link
Member

pombredanne commented Mar 29, 2022

@emmahsax Thank you for the feedback!

For now, scancode has been strictly working offline e.g., without any network connection and without running a build. Offline means that we do not have/fetch details for things that do not exist in the codebase.

Getting license for packages that do not exist locally can be resolved in two ways:

Short of these, there most dependencies are just name/version constraints pairs (with one notable exception with PHP composer.lock lockfile that contains the whole package details of every dependencies including declared license).

So we have already some code to do a good part of the solution... there is still quite a bit of work to complete these though we could focus on a simple case of npm and yarn lock for a start: help is mucho welcomed to speed up the process!

@emmahsax
Copy link

emmahsax commented Mar 29, 2022

@pombredanne In order to get some results, I've just been passing in yarn.lock. Whenever I try to run scancode offline on an entire repository, it takes forever to get through the node_modules directory. It sometimes takes hours to scan all the node_modules directory, and then will just timeout at the filtering stage. I'm not sure if there's some sort of flag I'm supposed to use to make it not scan all the way down all the node modules. One offline run lasted for over 48 hours before I finally killed the process.

@pombredanne
Copy link
Member

@emmahsax sorry for missing your report above... do you mind creating a different issues for this?

@emmahsax
Copy link

@pombredanne My team actually switched to using a different tool, so I haven't had a chance to really see these issues again. So if I'm 100% honest, I probably won't come across this issue again, and therefore won't have the information I'd need to make a proper issue. But I'd encourage anybody else that sees time out issues or incomplete answers or long runs (over 10 minutes) to make an issue. The repository I was running this on had a massive yarn.lock file... think monolith of a company massive.

@pombredanne
Copy link
Member

@emmahsax thanks! can you tell which tool you use?

@emmahsax
Copy link

We've been using https://github.com/pivotal/LicenseFinder, and so far it's been a fantastic tool to use, both for our bundler gems and our node modules installed via yarn. Also seems to work with our go modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies live-online-scan Anything that requires a live, online netwrokd access (and would not workd in an isolated network) new feature
Projects
None yet
Development

No branches or pull requests

4 participants