The future of security scanning in Open Source

by Debricked Editorial Team

2021-05-18

13 min

Telling the world about what we do at Debricked is always fun, and we try to pay a visit to a podcast every now and then. This one turned out to be extra special.

Not only did we get to brag about all our cool machine learning technology and how we’re trying to change the way people use open source (and security scanners) – we also got to do it in one of our favorite podcasts; the Open Source Security Podcast!

So, here it is in all its glory. If you’d rather listen to the episode you can do so here. Now, let’s hop into the conversation.

Security scanning in a new way

Josh Bressers: So, the real reason I’m so excited to talk to Debricked, and this is a pain point Kurt and I have complained about at length and constantly, is that all of the vulnerability databases that exist today are these artisanal, hand-curated data sources.

Debricked is taking a completely different view of this and you’re doing some amazing things. So, tell us what you’re doing and why I’m so excited.

Emil Wåréus: I’m really happy that you’re excited about this, I am too! So, what we’re doing is that we’re trying to automate the collection of vulnerabilities in a different way than manually-curated databases.

And, looking at those databases, it takes a lot of work to maintain, analyze and understand vulnerabilities and how they contextualize in different environments. What we are doing instead is that we’re trying to automatically, with a lot of machine learning, find vulnerabilities straight in the code, in the sources, in all the open source in the world.

So for instance, we look at commits, code changes and the raw code itself, issues, pull requests in different open source projects and try to map out what is potentially security related or not.

This has a couple of advantages compared to the traditional way of doing it with manual analysis.

Emil Wåréus: Our vulnerabilities are really close to the source, to the code, and we can quite easily map out if you, for instance, are using that code in both a static or a dynamic way, which is quite cool. But of course, you know that machine learning is not always perfect…

Josh Bressers: That’s not what I’ve been told. Machine learning does it all. The only thing better is blockchain, right? 😀

Emil Wåréus: Yeah, of course! Maybe we should try to incorporate that there as well. But yeah, we have a quite high performing algorithm. The overall performance an F1 measure of about 85%.

Precision, recall and the vulnerable functionality in security scanning

Josh Bressers: What does an F1 measure mean? Can you tell us?

Emil Wåréus: It’s the harmonic mean between the recall and the precision. And so, the precision is looking at the true positive rate. So of the predictions, how many of those are true positives? And that’s about 87, 88% right now. So, that’s quite cool.

Josh Bressers: That’s crazy.

Kurt Seifried: That’s better than most people.

Emil Wåréus: Yeah. That’s really high. We realized, when we experimented with different models, that we got that precision by aggregating all the different data sources. For example, looking at both NFP at the commit messages and the pull requests, at the statistical properties of commits, how many lines we’re changing, in what type of files it was, what the code sharing in that commit was and so on. And, also looking at the actual code itself.

But, with that high performance, when we look at all the commits in open source projects, we still get quite a lot of false positives. But, I think, what we’re doing here is that we’re collecting vulnerabilities that are really easy and nice to contextualize to the software that you’re using.

So, if you look at normal vulnerabilities like the ones that you find in advisories, such as NVD, we know that those are actual vulnerabilities, but the false positives are not in terms of their vulnerabilities or not because we know that.

The false positive rate comes from whether you’re using that dependency, you may have mapped that wrong. It’s whether you’re using the vulnerable code if it’s only used during tests if it’s used during runtime in an environment that makes you susceptible to that vulnerability or not.

But, mapping whether you’re actually using a vulnerability from those normal types of vulnerabilities is really hard because they’re quite far from the code.

But looking at where we have analyzed the code itself and found vulnerabilities, the risk is rather that they may not be vulnerabilities at all because we have classified them wrong, but we know that you are 100% using that code that we have classified as a vulnerability.

So, it’s completely another approach in terms of how you traditionally view security and open source, I’d say.

Josh Bressers: I love this. I mean, this is one of my pet peeves of software composition analysis. Every single one I’ve seen so far just takes the stupid approach of “you ship that you’re vulnerable.” Whereas, a lot of these open source libraries do a lot of things and you might only need one method. So, that’s so cool.

Emil Wåréus: It’s quite a fun challenge to work with! Looking at both the machine learning parts and the analysis of the code, because then you really get to dig deep into how different languages work.

Most common languages are terrible in the static analysis, such as Python and JavaScript.

So, looking at dynamic run time analysis rather, what is actually called in production and analyzing that is way more efficient than simply mapping out to the dependency files, or just looking at the static code.

Josh Bressers: Okay. So, let me understand this. You’re saying that Debricked is watching the running application. Do you watch it in production or do you watch it during test runs? How are you doing that?

Emil Wåréus: Currently, we’re looking statically only. But we’re working together with some friends of ours in the Northern parts of Sweden, not Elasticsearch, but Elastisys. Quite a close name there. We are building out the functionality to do runtime monitoring.

So, then, we will use both in test environments to look at, for instance, we have this small plugin already for PI test where we trace the food test graph and we can map if you’re calling a vulnerable functionality or not.

But, that of course, depends on your test coverage. But then again, this will also work in a run time environment where we can monitor different functionalities that we deem to be vulnerable and whether they are called or not.

Kurt Seifried: So one thing curious with this, so it sounds like you’re trying to automate the finding of the base. Let’s call them the base flaws. So, somebody makes a commit and makes a whoopsie, but then you’re also trying to automate this application, this composed of a thousand open source libraries actually using this in a vulnerable way.

Emil Wåréus: Exactly. That can vary quite a lot in difficulty. For the simple cases where you maybe have an injection type of vulnerability as SQL injection. Those are usually quite compartmentalized into maybe one single function within that particular open source library.

And then we can map if you’re using that function, if that function is ever called either statically, for instance, we are doing that with Java or dynamically during a run time environment where you can install one of our plugins later on.

Then, when it’s finished, but installed this plugin, and we will monitor whether that function is ever called or not. And when it’s called, you will get to know that in our tool and you can make decisions accordingly to actually handle that risk.

Kurt Seifried: Oh, that seems actually really self-evident. Now that you say … No, seriously. You monitor your system for what it does and does it use some known vulnerable issue and then flag it and I’ve never heard anybody do that before. Now, I’m kind of embarrassed. I didn’t think of it.

Emil Wåréus: We did this exercise a while ago, internally, with our security team where we try to think of what are the different ways to reason about false positives within vulnerabilities in open source.

And we actually divided it into four levels where most of the tools are still at level one where you simply incorrectly map vulnerabilities to dependencies or create inaccurate SBOM’s.

Kurt Seifried: Right, because we’ve never seen that happen before.

Emil Wåréus: No, of course not. We actually mapped ourselves and our competitors only in that regard. It’s quite interesting to look at how it differs so much depending on the software that you’re benchmarking on or the language that you’re working with and so.

But, it is a hard problem and the indication that we’re seeing is that, you can get better there, but it’s maybe not the right way to go at all because you’re not getting into the other layers of false positives.

The second layer of precision and why security work shouldn’t necessarily be done manually

Emil Wåréus: So, the second layer and level two of false positives. So, when you have the correct SBOM and you correctly map those vulnerabilities to the dependencies, the second layer is kind of similar to what I just described.

It’s a false positive if you’re not using the vulnerable functionality in a static way. So, it’s not statically called. And then we are looking at the third layer, which is if the vulnerability is called in a critical environment during run time. And then we look at the fourth and final layer; that it’s exploitable in a critical environment during run time.

Josh Bressers: You say that out loud and it’s completely obvious, but I don’t think we don’t always take that into consideration because most of the composition tools just assume if you have it you’re vulnerable. There’s zero intelligence regarding what the exploit actually is.

But again, this is such a crazy problem because … I want to call something out that I suspect a lot of the audiences are thinking of. The vast majority of vulnerability information that exists today has virtually no usable machine information in it. For example, all of the NVD data is just a prose description.

You’re saying that you have the ability to take what little data exists, figure out what part of the dependency it actually affects, because, obviously, you can’t tell me I’m not vulnerable because I’m not using the function if you don’t know what the function is. You know what I mean?

And I think that is mind-blowing, because there are hundreds of thousands of vulnerabilities. There probably should be millions. And, obviously, humans can’t possibly do this.

Emil Wåréus: No, of course not. So, this is actually one of the essences of the problem here, finding the vulnerable functionality. And looking at existing vulnerabilities, we do that for those as well.

That can be quite challenging because then you have to, first of all, find the software, the open source project, and understand the open source project in an automatic way, which can be quite challenging. There, you can usually find information about what version ranges are vulnerable.

And from that, you can derive if there is a safe version and a vulnerable version. You can look at the differences in the code that were made during those version updates.

Kurt Seifried: That was my life for about 12 years. Every time ISC released a binder speed update, it’s like two or 300 kilobytes of the diff, and then digging through that to find the vulnerable function.

Josh Bressers: And humans are terrible at this work. We make tons of mistakes.

Kurt Seifried: Even if you’re good at it, it’s hugely time-consuming.

Emil Wåréus: Yeah. And here’s another thing that you’re looking at. You’re looking at a version diff, where a version diff may contain many commits, usually quite a lot of them.

And then what functions were changed, and then doing that diff can vary from just a couple of them, three or four to many hundreds, usually hundreds if it’s a major or minor release and not a patch.

Then, if you’re looking at that data, of course, you can say that I’ve found the vulnerable functionality, but also 99 other non-existing problems that you are mapping to.

Emil Wåréus: What we’re doing here is that we’re using our machine learning algorithm to look at that set of commits that were made during that version updates and seeing what are the highest probability commits that fixed that vulnerability so we can narrow down those matches of potentially vulnerable functionalities, quite a lot.

And, instead of mapping to hundreds of potentially vulnerable functionalities, we can find, maybe, the three to four to five highest candidates of functions that we think. These are the ones that we think are vulnerable. If you’re using those, you should really look into that.

Josh Bressers: I love this. I mean, one of my other longstanding complaints about all this is, most of these security scanners don’t tell you what you should like. If I was going to prioritize all these findings in a list and basically start at the top at the most dangerous and work my way down, they rarely take any of this into account.

You just get this huge pile of stuff. And it’s frustrating to me because, obviously, we have limited resources. We can’t literally fix everything immediately. We have to figure out what is important and what’s not. That’s something I do know is I do this manually. And, it’s a horrible task. Having a computer do this sounds amazing to me.

Emil Wåréus: And, first of all, maybe, computers in this manner can probably do a lot. But even if it can’t, it can give you the information you want to actually do that analysis yourself if you need to which is huge.

Kurt Seifried: Well, that’s the thing I keep finding the right. People like me and Josh have a lot of domain-specific knowledge that we can apply to look at a vulnerability, but not everybody else has spent 20 years in the open-source world digging through thousands of vulnerabilities.

And to be honest, nor should they. It’s insane. And what drives me nuts is this is not the whole point of IT, to automate stuff and to take human knowledge and shove it into a computer.

Emil Wåréus: I completely agree with that.

Kurt Seifried: And I know it’s not easy, but it drives me nuts that we’re stuck with all these security scanning products that do things in a way that … Personally, I find it more painful than useful in most cases.

Emil Wåréus: I completely agree that we should never consume more time than you give to the end user and try to enable them to do a better job in a faster manner and more accurately than you did before. And, I agree that all security scanners don’t do that.

You may just be fed with the false positives. And, if you look at ourselves a couple of years ago, we were not there. And, by talking a lot to our users, some people could tolerate us, but we wanted them to love us. This is what they asked for.

They wanted to be able to accurately reduce their vulnerability risk in a manner and in a way that they could perform this analysis and do this faster and save time and be very accurate in their work. And, I think, this is the right way to go and the way the industry will help this well in terms of managing open source vulnerabilities.

Kurt Seifried: Well, the one thing I’ve consistently seen with respect to false positives is, a lot of these security scanning tools want to generate a giant phone book size report of results because it looks very impressive. We found 500 issues. And then, of course, how many of those are actually legitimate or real is another question.

So, it seems to me like there’s essentially two strategies here, sort of two reproductive strategies. One is to have a lot of children that are very low quality. And the other one is, you’re approached to have very few children that are high quality.

I just ran one of our code repositories through your security scanner and the results you give versus the GitHub Dependabot alerts are vastly different. And from what I can tell, the GitHub one just goes straight off of the NVD data, whereas your data seems to have a bit more context.

Emil Wåréus: That’s really great to hear, actually. Thanks!

Kurt Seifried: Actually, you have more stuff than the Dependabot alerts, which kind of worries me. I was sort of … Well, Dependabot is free. So, we use it. It’s free. One of the rules I have now is, I will not buy or use a security tool that requires a significant amount of time and expertise because we don’t have the time and effort to keep it fed for these things.

Basically, I don’t want to buy tools that require me to hire more people because in most cases, these tools are not doing things that are so amazing that it will prevent all compromises of our system or whatever. If they did that, guaranteed, then I would absolutely have another person, but, well, that’s the thing. Even if you keep your software completely up-to-date, you’re still not guaranteed. Your database will eventually find stuff.

Contributing to the open source that you use

Josh Bressers: Okay. We’re running ourselves out of time and I wanted to touch on one other, I think, important thing you mentioned way back at the beginning was that you are empowering people to become more involved in open source. And I’m curious about what you meant by that comment because that intrigues me.

Emil Wåréus: Yeah. So I think, that’s actually a challenge today that open source projects are not aware what and how their products are being used in reality.

We have a friend of ours at the company who is the former head of open source at Sony Mobile.

Usually, when they needed it to use an open source project that had a non compliant license, they needed to talk to the open source officer and see if they could maybe release a dual license for them, see what they could do.

And the projects themselves usually reacted something like “Oh my God! Sony is using the software I built. Of course, you can have any license you want.”

And I think that kind of articulates a challenge, that how can the industry contribute more and more efficiently to open source projects and with the power of deeply understanding code from similar systems.

As we’ve talked about of both static and dynamic analysis, we can also look at what the open source projects are using, how much of each project are you using and which is your most impactful project in your software and linking that to the overall project, health, for instance, and seeing where you can be most impactful in your contribution.

Josh Bressers: I mean, that makes perfect sense. I feel like everything you’ve said today has been one of those … Well, it’s so obvious now that you say it out loud, but I guess, I hadn’t really thought of it before.

Emil Wåréus: Because when we talk to our customers and potential users and looking at their contribution strategy, it’s usually something that’s quite new. Only really large companies in my experience have something like a contribution strategy in mind, but it’s usually, they’re projects that you think are the ones you need to contribute to that are top of mind.

Yeah. TensorFlow, Kubernetes, the large ones that you’re using a lot. Those are not the ones that bear real risk always to your organization, because those are backed by Google and other large companies.

It may be the smaller projects that’s not backed by any company that may be trending down that are very dependent on a very few active maintainers and understanding like where that future health risk of the project is and what projects are actually impactful to your software. I think that’s a really good opportunity.

And that’s where I want to bring Debricked so that we can close the full chain of consuming, using, and understanding and contributing back to open source.

Josh Bressers: That’s amazing. You just made me realize that there’s … I’m sure you’ve seen that XKCD comic where it talks about, it shows the stack and there’s one little cube holding it up and it says some guy in Nebraska or whatever.

I mean, you’re in a position to identify that brick. That’s literally one guy who’s committing only on the weekends and it’s a critical path for your application. That is really cool.

Kurt Seifried: The thing that gets me is not just the critical path through the application, but so much stuff gets pulled in. Number one, just having a list of it is amazing.

But number two, actually knowing roughly how important it is, how critical is this, if this breaks or has a security issue, how badly would my application break, that would be hugely helpful knowing where to focus. I suspect most people have no idea.

Emil Wåréus: I mean, to understand software is a really hard challenge in general. So, I mean, isolating the open source software in depth and mapping that towards actual people contributing to those projects. I mean, it is a hard challenge, but I think it’s so fun to work with. And it’s so cool to be part of and trying to understand all the amazing people doing good stuff for software in the world.

Josh Bressers: Absolutely.

To finish up, Josh asked Emil for a few last words. What did he want people to take away from the episode? Emil’s answer was “I just want to enable developers to use the open source that they love”.

This sums up Debricked’s approach quite well. Stay secure, compliant and use the open source you love – it can be as simple as that. If you’d like to see what all the fuzz is about, you can create a free Debricked account here.

Security

Security
CVSS precedence in NVD and Debricked

2024-01-11 13 min
Latest News
The CVSS score gets updated to CVSS v4.0

2023-10-30 13 min
Security
6 SCA tool trends to keep tabs on

2023-09-18 13 min

The future of security scanning in Open Source

Security scanning in a new way

Precision, recall and the vulnerable functionality in security scanning

The second layer of precision and why security work shouldn’t necessarily be done manually

Contributing to the open source that you use

Related posts

CVSS precedence in NVD and Debricked

The CVSS score gets updated to CVSS v4.0

6 SCA tool trends to keep tabs on

Safer travels in cyberspace