Skip to content

Parse CHANGELOGs to discover new Vulnerabilities #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sbs2001 opened this issue Jul 26, 2020 · 2 comments
Open

Parse CHANGELOGs to discover new Vulnerabilities #233

sbs2001 opened this issue Jul 26, 2020 · 2 comments

Comments

@sbs2001
Copy link
Collaborator

sbs2001 commented Jul 26, 2020

FYI This came up in @pombredanne 's talk at Open Source Summit 2020.

The idea is, FOSS projects which don't come under any CNA might have discovered several bugs which may come under security category and due to
-1. The complexity of getting a CVE
or
-2 Inability to classify a bug as a security issue.

Such security issues may go unnoticed. If we are able to find such issues we will be able to make FOSS safer and a better place obviously, and the users will now have an incentive to upgrade the software which makes coping with changes bearable.

One way to acheive this goal is, parsing CHANGELOGS of FOSS projects, and finding changes which are related to security fixes. For this the implementation of ML classifier would look like the following: (This is repaste from gitter)

use our existing data, find the version of a package where the vulnerability was first fixed, map the version to it's changelog . There's https://github.com/pyupio/changelogs to fetch changelogs(it maps version->change too). Extract such changelogs.

ML model would be trained by something along the lines of : Given the presence and absence of such and such words , the changelog is/not related to Security. And we would also add non security related changelogs during the training too, so the model is not biased.

The classifier won't be accurate, but would definitely reduce the search space. The CHANGES tagged with security will be fed into a manual curation queue and issued a Vulnerability identifier (Something like CVE) bringing it to 'addressable existence'.

The beginning of wisdom is to call things by their proper name.
-Confucius

This needs #232 to be addressed.

@pombredanne pombredanne changed the title Parse CHANGELOGS to discover new Vulnerabilities Parse CHANGELOGs to discover new Vulnerabilities Sep 10, 2020
@pombredanne
Copy link
Member

As a test what about a project such as Django or Flask?
That could be a practical first thing to work on. And there is also an another related way by using git reflog

@sbs2001
Copy link
Collaborator Author

sbs2001 commented Jan 10, 2021

@pombredanne Django has good changelogs for eg https://github.com/django/django/blob/5fcfe5361e5b8c9738b1ee4c1e9a6f293a7dda40/docs/releases/1.8.18.txt#L9 . That would be probably easier to start with.

Having said that I'm pretty sure GHSAs cover that. Is there any value in collecting django changelogs ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants