Skip to content

GSoC 2020 Project Idea: Improve CVE Binary Tool Output #267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
terriko opened this issue Jan 8, 2020 · 15 comments
Closed

GSoC 2020 Project Idea: Improve CVE Binary Tool Output #267

terriko opened this issue Jan 8, 2020 · 15 comments
Labels
gsoc Tasks related to our participation in Google Summer of Code

Comments

@terriko
Copy link
Contributor

terriko commented Jan 8, 2020

The CVE Binary tool team is hoping to participate in Google Summer of Code (GSoC) under the Python Software Foundation umbrella. You can read all about what this means at http://python-gsoc.org/. This issue, and any others tagged 'gsoc' are not generally available bugs, but related to project ideas for GSoC.

Project Idea : Improve CVE Binary Tool Output

Project description: The CVE Binary Tool has a couple of issues related to output that could be combined into a single project:

#262 - Machine readable output. Currently the CVE Binary Tool prints information about CVEs found to the console. We'd like it to be easier for machines to parse. That issue talks about doing it in a CSV (comma-separated value) format. Once that works, you might also want to get it working with JSON or even provide prettier HTML reports with colours and additional data.

#332 - Generate full reports with CVE descriptions, etc. We don't currently store these in the database and probably don't want to for speed/space reasons, so you'd have to grab from the json. This was a feature that used to exist in cve-bin-tool before it was open sourced. The idea, I believe, is that you'd have something you could easily attach to an email or send in a meeting agenda so decisions could be made prioritizing fixes. In practice it wasn't getting used much which is why it wound up dropped before release, but it could still be useful for folk who need more info to send to their colleagues.

#413 -The csv2cve utility currently outputs a bare list of CVE numbers, while the main cve-bin-tool outputs product, version, cve_number, severity. I think it would be nice if csv2cve did the same or perhaps even better, use vendor, product, version, cve_number, severity since the vendor information is easily available.

Some older output issues that are now resolved (but might be interesting reading for the type of output fixes we want):
#182 - Unify logging vs verbose/quiet flags. Currently the CVE binary tool has both print and log statements. We'd like to switch everything to use the log system. (Solved in #276 -- thanks @PrajwalM2212)

#197 - Improve NVD output so error messages go to stderr instead of stdout. Solving #182 will probably solve this one, but as an easier bug, you could start by switching all the print statements containing errors to print to stderr. (Solved in #276 -- thanks @PrajwalM2212)

#286 - Bring back the --quiet flag (or make the equivalent --log command work like the --quiet flag did) (Solved in #290 -- thanks @PrajwalM2212)

Skills: Python, git, experience with common output formats like json and csv a bonus

Difficulty level: Intermediate

Related Readings/Links: How to add new checkers

Potential mentors: @terriko @pdxjohnny

Getting Started: Python requires that all students submit a code sample as part of your application.

One possible good first pull request for this project: Fixing the "critical" output to be "warning" output in #306

If the bugs above are already resoled, try adding a test! There are two types of easy tests you might want to try first: CVE mapping test and CVE file test. Note: the way we add tests has changed recently, so please make sure to read the instructions!

Here's the file mapping test instructions cut and pasted:

To make the basic test suite run quickly, we use "faked" binary files to test the CVE mappings. However, we want to be able to test real files to test that the signatures work on real-world data.
We have a function that takes a url, and package name and a version, and downloads the file, runs the scanner against it, and makes sure it is the file that you've specified. But we need more tests!

  • Existing tests are in test/
  • You can see the scanner tests in 'tests/test_scanner.py'
  • To add a new test, find an appropriate publicly available file (linux distribution packages and public releases of the packages itself are ideal). You should add the details of the new test case in the @pytest.mark.parametrize decorator of test_files test
  • Make sure to hide it behind the LONG_TESTS flag so we aren't doing huge number of downloads for every test suite run
    @pytest.mark.parametrize(
        "url, filename, package, version",
        list(
            itertools.chain(
                [
                    (
                        "https://archives.fedoraproject.org/pub/archive/fedora/linux"
                        "/releases/20/Everything/x86_64/os/Packages/c/",
                        "curl-7.32.0-3.fc20.x86_64.rpm",
                        "curl",
                        "7.32.0",
                    ),
                    (
                        "http://mirror.centos.org/centos/7/os/x86_64/Packages/",
                        "expat-2.1.0-10.el7_3.i686.rpm",
                        "expat",
                        "2.1.0",
                    ),
                    (
                        "http://http.us.debian.org/debian/pool/main/e/expat/",
                        "libexpat1_2.2.0-2+deb9u3_amd64.deb",
                        "expat",
                        "2.2.0",
                    ),
                    (
                        "http://archive.ubuntu.com/ubuntu/pool/universe/f/ffmpeg/",
                        "ffmpeg_4.1.1-1_amd64.deb",
                        "ffmpeg",
                        "4.1.1",
                    ),
                    .....
                    .....
                    .....
    @unittest.skipUnless(os.getenv("LONG_TESTS") == "1", "Skipping long tests")
    def test_files(self, url, filename, package, version):
        self._file_test(url, filename, package, version) 

Ideally, we should have at least one such test for each checker, and it would be nice to have some different sources for each as well. For example, for packages available in common Linux distributions, we might want to have one from fedora, one from debian, and one direct from upstream to show that we detect all those versions.

Extra credit: Got your test working and want to try something more? You can also try adding a checker before the project starts. See the related readings above for instructions.

@terriko terriko added the gsoc Tasks related to our participation in Google Summer of Code label Jan 8, 2020
@terriko
Copy link
Contributor Author

terriko commented Jan 8, 2020

Some places where we are missing tests that you might want to try first: #274 #237.
(Edit: formerly this listed #273 #271 #270 but they have initial tests now. There's no reason you couldn't add one with a different package, though!)

@terriko
Copy link
Contributor Author

terriko commented Jan 10, 2020

Another output-related issue that might be of interest: #261 Handled!

@terriko
Copy link
Contributor Author

terriko commented Jan 25, 2020

Two more output related bugs, and the first one should be pretty easy to fix for beginners looking for their first issue: #306 (second now handled: #307)

@k-udupa2000
Copy link
Contributor

k-udupa2000 commented Feb 5, 2020

I tried printing output to csv file by passing CSV = 1 argument.

Command : LONG_TESTS=1 CSV=1 python -m unittest test.test_scanner.TestScanner.test_systemd_rpm_219

Code (cli.py):

if found_cves.keys():
   if(os.getenv("CSV") == "1"):
        fileOpenMode = 'a'
        with open('sample_output.csv', mode= fileOpenMode) as sample_output:
            file_handle = csv.writer(sample_output, delimiter=',', quotechar='"')
            for i in found_cves:
                file_handle.writerow([str(version), str(i), str(found_cves[i])])
    else:
        self.logger.info("Known CVEs in version " + str(version))
        self.logger.info(", ".join(found_cves.keys()))

This prints it out into a CSV file
19,CVE-2019-3844,HIGH
219,CVE-2017-1000082,CRITICAL
219,CVE-2017-18078,HIGH
219,CVE-2017-9217,HIGH
219,CVE-2017-9445,HIGH
219,CVE-2018-1049,MEDIUM....

Is this the right step towards improvizing Binary Output?

@terriko
Copy link
Contributor Author

terriko commented Feb 10, 2020

I think we want an actual command line flag for this rather than an environment variable (as a rule of thumb: environment variables are for developers and test integration, command line flags are for anything a user is likely to want to do) but other than that, yes this is the right direction!

I notice that the first column there is just the version -- you're going to want the product name as well. I guess for ideal machine readability it might be best to use the same vendor, product pair that you'd expect in the nvd database. (and we'll have to do some finessing to make that work with checkers that handle more than one vendor/product pair?)

@k-udupa2000
Copy link
Contributor

Thanks for the explanation!
I will work on it!

@SinghHrmn
Copy link
Contributor

@terriko apart from #262 #332 what can be some other issues related to this idea.

@terriko
Copy link
Contributor Author

terriko commented Feb 18, 2020

@SinghHrmn Those are the two big ones since a lot of the smaller issues have been closed. But there's almost certainly other things to be done in this space and this is a great issue to do some brainstorming in.

Here's a few off the top of my head:

  • Unifying the ways in which output strings are prepared. Right now we've got a mix of .format() and + and other options -- we could definitely do some reading on best practices, choose one, and make this more consistent, then document in some sort of style guide so we stay consistent. (that's probably only a few days work, though)
  • Preparing cve-bin-tool to accept translations/internationalization. https://www.mattlayman.com/blog/2015/i18n/ looks like it might be a good place to start if you've never read up on how that works in python. Or if you prefer videos, I'm pretty sure there've been several talks on this at PyCon US and other Python conferences.
  • Improving test output. I have no idea what we'd want to do here, but maybe if you look at the output you'd get some ideas? One thing that I can think of off the top of my head is that a lot of tests print "false is not true" when they fail. You can see where I added more informative messages to one test in test_cli.py here
  • making a --console-colour option to print pretty colours on the console. Maybe have it use colours to group the CVEs so you can tell at a glance which ones are all from the same library? Maybe add other special colours for the severity? (Note, this probably needs different code for linux and windows)

@SinghHrmn
Copy link
Contributor

Oh! those were some really great ideas @terriko. I'll try to implement the easy ones before the GSoC period and rest can be added as the stretch goals for the GSoC period. Thank You!

@SinghHrmn
Copy link
Contributor

Unifying the ways in which output strings are prepared. Right now we've got a mix of .format() and + and other options -- we could definitely do some reading on best practices, choose one, and make this more consistent, then document in some sort of style guide so we stay consistent.
(that's probably only a few days work, though)

@terriko I researched on this topic and found out that [String Interpolation / f-Strings (Python 3.6+)]
is the best practice if are on python 3.6+.( as we are ) So we can start implementing this in practice.

More Information: https://realpython.com/python-string-formatting/

@terriko
Copy link
Contributor Author

terriko commented Feb 21, 2020

I did some quick double-checking to make sure centos 7 handled python 3.6, and it does. I checked Poky and it's on 3.8 (although the yocto build tools still support 3.4+, I think that's not enough reason for us to support the same).

So I think we're good with the fstrings plan, now with some relevant data and not just my gut feeling that fstrings would be awesome. Hurrah!

@terriko
Copy link
Contributor Author

terriko commented Feb 21, 2020

Went through this issue and marked the completed bugs. So that leaves us with 3 output related issues still open
#262 - machine readable ouptut
#332 - more extensive CVE reports
#306 - improve the nvd output when sha256 mismatch (i.e. files need updating)

(Plus the new one that @SinghHrmn made and is already working on #374, which will likely be merged today)

@terriko
Copy link
Contributor Author

terriko commented Feb 26, 2020

Added a new issue #413 (csv2cve does not output CVE severity) that could potentially be part of an output-related project.

@terriko
Copy link
Contributor Author

terriko commented Mar 11, 2020

Added a new thought in #475

@terriko
Copy link
Contributor Author

terriko commented Sep 8, 2020

I think @Niraj-Kamdar and @SinghHrmn safely covered this in gsoc 2020, so I'm going to close the old project idea.

@terriko terriko closed this as completed Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gsoc Tasks related to our participation in Google Summer of Code
Projects
None yet
Development

No branches or pull requests

3 participants