Parse CHANGELOGs to discover new Vulnerabilities #233

sbs2001 · 2020-07-26T05:41:31Z

FYI This came up in @pombredanne 's talk at Open Source Summit 2020.

The idea is, FOSS projects which don't come under any CNA might have discovered several bugs which may come under security category and due to
-1. The complexity of getting a CVE
or
-2 Inability to classify a bug as a security issue.

Such security issues may go unnoticed. If we are able to find such issues we will be able to make FOSS safer and a better place obviously, and the users will now have an incentive to upgrade the software which makes coping with changes bearable.

One way to acheive this goal is, parsing CHANGELOGS of FOSS projects, and finding changes which are related to security fixes. For this the implementation of ML classifier would look like the following: (This is repaste from gitter)

use our existing data, find the version of a package where the vulnerability was first fixed, map the version to it's changelog . There's https://github.com/pyupio/changelogs to fetch changelogs(it maps version->change too). Extract such changelogs.

ML model would be trained by something along the lines of : Given the presence and absence of such and such words , the changelog is/not related to Security. And we would also add non security related changelogs during the training too, so the model is not biased.

The classifier won't be accurate, but would definitely reduce the search space. The CHANGES tagged with security will be fed into a manual curation queue and issued a Vulnerability identifier (Something like CVE) bringing it to 'addressable existence'.

The beginning of wisdom is to call things by their proper name.
-Confucius

This needs #232 to be addressed.

pombredanne · 2020-11-18T11:34:04Z

As a test what about a project such as Django or Flask?
That could be a practical first thing to work on. And there is also an another related way by using git reflog

sbs2001 · 2021-01-10T13:34:06Z

@pombredanne Django has good changelogs for eg https://github.com/django/django/blob/5fcfe5361e5b8c9738b1ee4c1e9a6f293a7dda40/docs/releases/1.8.18.txt#L9 . That would be probably easier to start with.

Having said that I'm pretty sure GHSAs cover that. Is there any value in collecting django changelogs ?

pombredanne changed the title ~~Parse CHANGELOGS to discover new Vulnerabilities~~ Parse CHANGELOGs to discover new Vulnerabilities Sep 10, 2020

pombredanne mentioned this issue Sep 10, 2020

Process unstructured data sources #251

Open

pombredanne added enhancement Data collection labels Feb 24, 2021

pombredanne removed the enhancement label Jan 24, 2022

pombredanne mentioned this issue Feb 14, 2023

Extract unpublished vulnerabilities from commit histories and trackers #1129

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse CHANGELOGs to discover new Vulnerabilities #233

Parse CHANGELOGs to discover new Vulnerabilities #233

sbs2001 commented Jul 26, 2020

pombredanne commented Nov 18, 2020

sbs2001 commented Jan 10, 2021

Parse CHANGELOGs to discover new Vulnerabilities #233

Parse CHANGELOGs to discover new Vulnerabilities #233

Comments

sbs2001 commented Jul 26, 2020

pombredanne commented Nov 18, 2020

sbs2001 commented Jan 10, 2021