Monday, November 19, 2007

Duplicates, Duplicates, and Duplicate Rates

Following Larry Suto’s analysis of NTOSpider, IBM’s AppScan, and HP’s WebInspect, where he compared code coverage to links crawled and vulnerabilities found, some questioned the accuracy of his results. Personally I didn’t get the criticism because with only two published reviews this year, we should be grateful to see data of any kind related to web application vulnerability scanners. I chalked the comments up as the standard scanner vendor or product reseller defense mechanism. Besides, Larry Suto is a professional source code reviewer and if he can’t figure it out, what chance does anyone else have? Well, expect for the vendor themselves, and this is where it gets interesting.

HP/SPI felt the issue important enough to research and respond to directly. Jeff Forristal (HP Security Labs) set up an environment to redo Larry’s work and measure WebInspect v7.7 and v7.5 for two of the three websites. While everyone is encouraged to read the report and judge for themselves, a couple of things really stood out to me in the data charts (see below) - specifically false-positives and “vulnerability duplicates”. I’ve only talked about the problem of vulnerability duplicates briefly in the past when describing how customers eventually mature to a Quality phase where less equals more. Obviously perople prefer not to see identical vulnerabilities reported dozens/hundreds of times.


If you look at the chart columns “Total # Findings”, “Raw False Positive”, and “Accurate # of Instances” - these compare what the scanner reported, to what was false, to what vulnerabilities were valid and unique. The two scanners reported nearly identical validated issues, 5 on the Roller website and 113/110 on OpenCMS. In the false-positive dept, WebInspect v7.7 did fairly well only having between 0% and 16% on the two websites, while v7.5 performed a little worse at 2% and 36%. But what you have to look closely at is the ratio of Total # of Findings to Accurate # of Instances (minus the falses) as this will measure the level of vulnerability duplicates.

In the Roller website v7.7 reported 40 unvalidated issues, with v7.5 displaying 55, all of which boiled down to 5 unique vulnerabilities. That means only 12% of v7.7 results are meaningful! v7.5 was 9%! On OpenCMS, of 1,258 unvalidated issues reported by v7.7 (3,756 for v7.5), came down to 113 unique vulnerabilities. Once again only 9% and 3% of the results were necessary. Shall we call this a Vulnerability Duplicate Rate? That’s a lot of data to distill down and must take a lot of time. For those that use these scanners, is this typical in your experience and expected behavior?

I know Ory is reading this… :), so can you give us an indication of what the accuracy rating might have been for AppScan?

5 comments:

Jordan said...

I don't have the hard data handy from my review (magazine reviews are more fire-and-forget -- have to move on to the next totally different topic before the previous one is even in print), but my recollection is that this is in line with what the majority of scanning products produced.

For some combinations of scanners/applications it was even worse. I specifically mentioned in the Cenzic review that I disliked their HARM score because it was too weighted toward apps where the same vulnerability showed up in multiple pages. One app had a vulnerability that resulted in an astronomical HARM score because it appeared on every page and was counted each time, despite being one vulnerability in a commonly used function.

While it turned out there was a way to cap the impact of a single vuln toward the HARM metric, I think the point that the duplicates were in and of themselves an issue was something I over looked.

Anonymous said...

Ha!, I knew I should've disabled my Referer header ;-)

Actually, we have been working on reproducing Larry's work in the lab (with some cooperation with Jeff@SPI), so when results come in, I'll post them in my blog.

Jeremiah Grossman said...

I have your IP address mapped through Google Analytics, which is wired to send me an alert any time you visit. :)

Anonymous said...

I think there is some implied/misleading connotations when you say only 12%/9% of the results are "meaningful." All the results are "meaningful"...they are all accurate results and can be thought of as valid attack vectors. Since once vulnerability instance can have multiple vectors to access the vulnerability, there doesn't need to be a 1-to-1 correlation between vulnerabilities and attack vectors. Superfluous vectors are not meaningless...especially if you are protecting your webapp via a vector-centric mitigation approach (i.e. WAF).

I guess this questions what exactly webapp vulnerability scanners are measuring: the number of exploit vectors (i.e. unique public entry points that lead to a vulnerability), or the number of unique underlying vulnerabilities (taking into account the potential for multiple access vectors). My opinion is that without access to source code and/or runtime instrumentation, it is extremely difficult to do the latter with a blackbox scanner approach. That’s not to say blackbox scanners fail in that regard--it ultimately depends on the user’s individual goals. Sure, those looking to fix code will likely want to know just the unique vulnerability instances; but those looking to determine vulnerability exposure and/or don’t have the capability to fix the code (particularly if it’s a third-party supplied webapp) are more likely to want to know how many attack vectors there are into the webapp. After all, unique vulnerability instances is a code-centric measurement by definition. Knowing a third-party supplied webapp only has one unique vulnerability instance doesn’t make any one of the multiple associated exploit vectors any less potent or “meaningless.”

Jeremiah Grossman said...

> I think there is some implied/misleading connotations when you say only 12%/9% of the results are "meaningful." All the results are "meaningful"...they are all accurate results and can be thought of as valid attack vectors.

Valid sure, helpful to resolving the problem, I’d say no. And if its not helpful, I have a had time believing that its meaningful.

> Since once vulnerability instance can have multiple vectors to access the vulnerability, there doesn't need to be a 1-to-1 correlation between vulnerabilities and attack vectors.

Agree, but If the customer intends to fix the code, someone is going to have to go through the results eventually and validate everything to isolate the unique vulnerability instances. And from that standpoint, duplicates waste just as much time as false positives do.

> Superfluous vectors are not meaningless...especially if you are protecting your webapp via a vector-centric mitigation approach (i.e. WAF).

Fair enough, but does that mean you are recommending customers pump 1,000 – 3,000 rules into their WAF of choice? If so, that still means they're going to have to go through all the results and separate out the falses anyway.

> I guess this questions what exactly webapp vulnerability scanners are measuring: the number of exploit vectors (i.e. unique public entry points that lead to a vulnerability), or the number of unique underlying vulnerabilities (taking into account the potential for multiple access vectors).

I’ve always been of the belief that scanners/VA are supposed to be measuring the security of a website, but that’s just me. And if its not secure, provide guidance as to what types of solutions might best improve the security posture.

> My opinion is that without access to source code and/or runtime instrumentation, it is extremely difficult to do the latter with a blackbox scanner approach. That’s not to say blackbox scanners fail in that regard--it ultimately depends on the user’s individual goals. Sure, those looking to fix code will likely want to know just the unique vulnerability instances;

Again in that case a high duplicate rate gets in the way.

> but those looking to determine vulnerability exposure and/or don’t have the capability to fix the code (particularly if it’s a third-party supplied webapp) are more likely to want to know how many attack vectors there are into the webapp.

In this case though, they have no choice. Perhaps this is where I’m confused. Is this a design choice of your product? Massive amounts of data that have to be widdled down in order to fix the code or not and placed into a WAF?

> After all, unique vulnerability instances is a code-centric measurement by definition. Knowing a third-party supplied webapp only has one unique vulnerability instance doesn’t make any one of the multiple associated exploit vectors any less potent or “meaningless.”

But it certainly doesn’t make it meaningful either.