GSoC 2020 Project Idea: Improve CVE Binary Tool Output #267

terriko · 2020-01-08T01:56:33Z

The CVE Binary tool team is hoping to participate in Google Summer of Code (GSoC) under the Python Software Foundation umbrella. You can read all about what this means at http://python-gsoc.org/. This issue, and any others tagged 'gsoc' are not generally available bugs, but related to project ideas for GSoC.

Project Idea : Improve CVE Binary Tool Output

Project description: The CVE Binary Tool has a couple of issues related to output that could be combined into a single project:

#262 - Machine readable output. Currently the CVE Binary Tool prints information about CVEs found to the console. We'd like it to be easier for machines to parse. That issue talks about doing it in a CSV (comma-separated value) format. Once that works, you might also want to get it working with JSON or even provide prettier HTML reports with colours and additional data.

#332 - Generate full reports with CVE descriptions, etc. We don't currently store these in the database and probably don't want to for speed/space reasons, so you'd have to grab from the json. This was a feature that used to exist in cve-bin-tool before it was open sourced. The idea, I believe, is that you'd have something you could easily attach to an email or send in a meeting agenda so decisions could be made prioritizing fixes. In practice it wasn't getting used much which is why it wound up dropped before release, but it could still be useful for folk who need more info to send to their colleagues.

#413 -The csv2cve utility currently outputs a bare list of CVE numbers, while the main cve-bin-tool outputs product, version, cve_number, severity. I think it would be nice if csv2cve did the same or perhaps even better, use vendor, product, version, cve_number, severity since the vendor information is easily available.

Some older output issues that are now resolved (but might be interesting reading for the type of output fixes we want):
~~#182 - Unify logging vs verbose/quiet flags. Currently the CVE binary tool has both print and log statements. We'd like to switch everything to use the log system.~~ (Solved in #276 -- thanks @PrajwalM2212)

#197 - Improve NVD output so error messages go to stderr instead of stdout. Solving #182 will probably solve this one, but as an easier bug, you could start by switching all the print statements containing errors to print to stderr. (Solved in #276 -- thanks @PrajwalM2212)

~~#286 - Bring back the --quiet flag (or make the equivalent --log command work like the --quiet flag did)~~ (Solved in #290 -- thanks @PrajwalM2212)

Skills: Python, git, experience with common output formats like json and csv a bonus

Difficulty level: Intermediate

Related Readings/Links: How to add new checkers

Potential mentors: @terriko @pdxjohnny

Getting Started: Python requires that all students submit a code sample as part of your application.

One possible good first pull request for this project: Fixing the "critical" output to be "warning" output in #306

If the bugs above are already resoled, try adding a test! There are two types of easy tests you might want to try first: CVE mapping test and CVE file test. Note: the way we add tests has changed recently, so please make sure to read the instructions!

Here's the file mapping test instructions cut and pasted:

To make the basic test suite run quickly, we use "faked" binary files to test the CVE mappings. However, we want to be able to test real files to test that the signatures work on real-world data.
We have a function that takes a url, and package name and a version, and downloads the file, runs the scanner against it, and makes sure it is the file that you've specified. But we need more tests!

Existing tests are in test/
You can see the scanner tests in 'tests/test_scanner.py'
To add a new test, find an appropriate publicly available file (linux distribution packages and public releases of the packages itself are ideal). You should add the details of the new test case in the @pytest.mark.parametrize decorator of test_files test
Make sure to hide it behind the LONG_TESTS flag so we aren't doing huge number of downloads for every test suite run

    @pytest.mark.parametrize(
        "url, filename, package, version",
        list(
            itertools.chain(
                [
                    (
                        "https://archives.fedoraproject.org/pub/archive/fedora/linux"
                        "/releases/20/Everything/x86_64/os/Packages/c/",
                        "curl-7.32.0-3.fc20.x86_64.rpm",
                        "curl",
                        "7.32.0",
                    ),
                    (
                        "http://mirror.centos.org/centos/7/os/x86_64/Packages/",
                        "expat-2.1.0-10.el7_3.i686.rpm",
                        "expat",
                        "2.1.0",
                    ),
                    (
                        "http://http.us.debian.org/debian/pool/main/e/expat/",
                        "libexpat1_2.2.0-2+deb9u3_amd64.deb",
                        "expat",
                        "2.2.0",
                    ),
                    (
                        "http://archive.ubuntu.com/ubuntu/pool/universe/f/ffmpeg/",
                        "ffmpeg_4.1.1-1_amd64.deb",
                        "ffmpeg",
                        "4.1.1",
                    ),
                    .....
                    .....
                    .....
    @unittest.skipUnless(os.getenv("LONG_TESTS") == "1", "Skipping long tests")
    def test_files(self, url, filename, package, version):
        self._file_test(url, filename, package, version)

Ideally, we should have at least one such test for each checker, and it would be nice to have some different sources for each as well. For example, for packages available in common Linux distributions, we might want to have one from fedora, one from debian, and one direct from upstream to show that we detect all those versions.

Extra credit: Got your test working and want to try something more? You can also try adding a checker before the project starts. See the related readings above for instructions.

The text was updated successfully, but these errors were encountered:

terriko · 2020-01-08T22:20:36Z

Some places where we are missing tests that you might want to try first: #274 #237.
(Edit: formerly this listed ~~#273 #271 #270~~ but they have initial tests now. There's no reason you couldn't add one with a different package, though!)

terriko · 2020-01-10T20:06:48Z

~~Another output-related issue that might be of interest: #261~~ Handled!

terriko · 2020-01-25T01:16:16Z

Two more output related bugs, and the first one should be pretty easy to fix for beginners looking for their first issue: #306 (second now handled: ~~#307~~)

k-udupa2000 · 2020-02-05T19:53:36Z

I tried printing output to csv file by passing CSV = 1 argument.

Command : LONG_TESTS=1 CSV=1 python -m unittest test.test_scanner.TestScanner.test_systemd_rpm_219

Code (cli.py):

if found_cves.keys():
   if(os.getenv("CSV") == "1"):
        fileOpenMode = 'a'
        with open('sample_output.csv', mode= fileOpenMode) as sample_output:
            file_handle = csv.writer(sample_output, delimiter=',', quotechar='"')
            for i in found_cves:
                file_handle.writerow([str(version), str(i), str(found_cves[i])])
    else:
        self.logger.info("Known CVEs in version " + str(version))
        self.logger.info(", ".join(found_cves.keys()))

This prints it out into a CSV file
19,CVE-2019-3844,HIGH
219,CVE-2017-1000082,CRITICAL
219,CVE-2017-18078,HIGH
219,CVE-2017-9217,HIGH
219,CVE-2017-9445,HIGH
219,CVE-2018-1049,MEDIUM....

Is this the right step towards improvizing Binary Output?

terriko · 2020-02-10T18:45:28Z

I think we want an actual command line flag for this rather than an environment variable (as a rule of thumb: environment variables are for developers and test integration, command line flags are for anything a user is likely to want to do) but other than that, yes this is the right direction!

I notice that the first column there is just the version -- you're going to want the product name as well. I guess for ideal machine readability it might be best to use the same vendor, product pair that you'd expect in the nvd database. (and we'll have to do some finessing to make that work with checkers that handle more than one vendor/product pair?)

k-udupa2000 · 2020-02-10T19:06:30Z

Thanks for the explanation!
I will work on it!

SinghHrmn · 2020-02-16T11:28:52Z

@terriko apart from #262 #332 what can be some other issues related to this idea.

terriko · 2020-02-18T18:59:00Z

@SinghHrmn Those are the two big ones since a lot of the smaller issues have been closed. But there's almost certainly other things to be done in this space and this is a great issue to do some brainstorming in.

Here's a few off the top of my head:

Unifying the ways in which output strings are prepared. Right now we've got a mix of .format() and + and other options -- we could definitely do some reading on best practices, choose one, and make this more consistent, then document in some sort of style guide so we stay consistent. (that's probably only a few days work, though)
Preparing cve-bin-tool to accept translations/internationalization. https://www.mattlayman.com/blog/2015/i18n/ looks like it might be a good place to start if you've never read up on how that works in python. Or if you prefer videos, I'm pretty sure there've been several talks on this at PyCon US and other Python conferences.
Improving test output. I have no idea what we'd want to do here, but maybe if you look at the output you'd get some ideas? One thing that I can think of off the top of my head is that a lot of tests print "false is not true" when they fail. You can see where I added more informative messages to one test in test_cli.py here
making a --console-colour option to print pretty colours on the console. Maybe have it use colours to group the CVEs so you can tell at a glance which ones are all from the same library? Maybe add other special colours for the severity? (Note, this probably needs different code for linux and windows)

SinghHrmn · 2020-02-19T18:34:50Z

Oh! those were some really great ideas @terriko. I'll try to implement the easy ones before the GSoC period and rest can be added as the stretch goals for the GSoC period. Thank You!

SinghHrmn · 2020-02-21T12:49:16Z

Unifying the ways in which output strings are prepared. Right now we've got a mix of .format() and + and other options -- we could definitely do some reading on best practices, choose one, and make this more consistent, then document in some sort of style guide so we stay consistent.
(that's probably only a few days work, though)

@terriko I researched on this topic and found out that [String Interpolation / f-Strings (Python 3.6+)]
is the best practice if are on python 3.6+.( as we are ) So we can start implementing this in practice.

More Information: https://realpython.com/python-string-formatting/

terriko · 2020-02-21T18:38:13Z

I did some quick double-checking to make sure centos 7 handled python 3.6, and it does. I checked Poky and it's on 3.8 (although the yocto build tools still support 3.4+, I think that's not enough reason for us to support the same).

So I think we're good with the fstrings plan, now with some relevant data and not just my gut feeling that fstrings would be awesome. Hurrah!

terriko · 2020-02-21T18:46:07Z

Went through this issue and marked the completed bugs. So that leaves us with 3 output related issues still open
#262 - machine readable ouptut
#332 - more extensive CVE reports
#306 - improve the nvd output when sha256 mismatch (i.e. files need updating)

(Plus the new one that @SinghHrmn made and is already working on #374, which will likely be merged today)

terriko · 2020-02-26T23:19:20Z

Added a new issue #413 (csv2cve does not output CVE severity) that could potentially be part of an output-related project.

terriko · 2020-03-11T22:59:39Z

Added a new thought in #475

terriko · 2020-09-08T23:59:49Z

I think @Niraj-Kamdar and @SinghHrmn safely covered this in gsoc 2020, so I'm going to close the old project idea.

terriko added the gsoc Tasks related to our participation in Google Summer of Code label Jan 8, 2020

terriko mentioned this issue Jan 8, 2020

GSoC 2020 discussion thread #269

Closed

Purvanshsingh mentioned this issue Jan 12, 2020

improved NVD output #281

Closed

SinghHrmn mentioned this issue Jan 31, 2020

GSoC 2020 Project Idea : Adding GUI for CVE Binary Tool #324

Closed

terriko mentioned this issue Feb 5, 2020

Generate more extensive CVE reports (e.g. for managers) #332

Closed

terriko mentioned this issue Feb 26, 2020

csv2cve does not output CVE severity #413

Closed

mariuszskon mentioned this issue Mar 11, 2020

Better console conventions - decouple console format logic from output formatting, ensure stdout is for output only #473

Merged

terriko mentioned this issue Mar 11, 2020

Output option: list of components/versions #475

Closed

terriko mentioned this issue Mar 19, 2020

Allowing for triage of cve-bin-tool results #486

Closed

terriko closed this as completed Sep 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GSoC 2020 Project Idea: Improve CVE Binary Tool Output #267

GSoC 2020 Project Idea: Improve CVE Binary Tool Output #267

terriko commented Jan 8, 2020 •

edited

Loading

terriko commented Jan 8, 2020 •

edited

Loading

terriko commented Jan 10, 2020 •

edited

Loading

terriko commented Jan 25, 2020 •

edited

Loading

k-udupa2000 commented Feb 5, 2020 •

edited

Loading

terriko commented Feb 10, 2020

k-udupa2000 commented Feb 10, 2020

SinghHrmn commented Feb 16, 2020

terriko commented Feb 18, 2020 •

edited

Loading

SinghHrmn commented Feb 19, 2020

SinghHrmn commented Feb 21, 2020

terriko commented Feb 21, 2020

terriko commented Feb 21, 2020

terriko commented Feb 26, 2020

terriko commented Mar 11, 2020

terriko commented Sep 8, 2020 •

edited

Loading

GSoC 2020 Project Idea: Improve CVE Binary Tool Output #267

GSoC 2020 Project Idea: Improve CVE Binary Tool Output #267

Comments

terriko commented Jan 8, 2020 • edited Loading

Project Idea : Improve CVE Binary Tool Output

terriko commented Jan 8, 2020 • edited Loading

terriko commented Jan 10, 2020 • edited Loading

terriko commented Jan 25, 2020 • edited Loading

k-udupa2000 commented Feb 5, 2020 • edited Loading

terriko commented Feb 10, 2020

k-udupa2000 commented Feb 10, 2020

SinghHrmn commented Feb 16, 2020

terriko commented Feb 18, 2020 • edited Loading

SinghHrmn commented Feb 19, 2020

SinghHrmn commented Feb 21, 2020

terriko commented Feb 21, 2020

terriko commented Feb 21, 2020

terriko commented Feb 26, 2020

terriko commented Mar 11, 2020

terriko commented Sep 8, 2020 • edited Loading

terriko commented Jan 8, 2020 •

edited

Loading

terriko commented Jan 8, 2020 •

edited

Loading

terriko commented Jan 10, 2020 •

edited

Loading

terriko commented Jan 25, 2020 •

edited

Loading

k-udupa2000 commented Feb 5, 2020 •

edited

Loading

terriko commented Feb 18, 2020 •

edited

Loading

terriko commented Sep 8, 2020 •

edited

Loading