Jeremiah Grossman: Web application scan-o-meter

Tuesday, May 29, 2007

Web application scan-o-meter

The new OWASP Top 10 2007 has recently be made available. Excellent work on behalf of all the contributors. As described on the website, “This document is first and foremost an education piece, not a standard.”, and it’ll do just that. Educate. Last week I provided project team with updated text (unpublished) that more accurately describes the current capabilities of “black box” automated scanners in identifying the various issues on the list. The exercise provided ideas for the remainder of this blog post; estimating how effective scanners are at finding the issues organized by OWASP Top-10.

In the past I’ve covered the challenges of automated scanning from a variety of angles including technical vs. logical, vs. low hanging fruit, vs. the OWASP Top 10, and in one occasion I threw down the gauntlet. Ory Segal (Director of Security Research, Watchfire) also weighed in with his thoughts via his shiny new blog that’s worth a read. Everyone agrees scanners find vulnerabilities, though most including product vendors admit they certainly don’t find everything. But that doesn’t explain nearly enough. Does this mean scanners find almost everything or just above some? Where does the needle land on the web application scan-o-meter? Sure, scanners are good at finding XSS. How good? Scanners are bad at identifying flaws in business logic. How bad? This is a very limited understanding and we could really use more insight into scanner capabilities with quantification of capacity.

Going by experience in developing scanner technology for the better part of a decade, testing scanners written by many others, feedback from the surveys, personal conversation, and quality time huddled with Arian Evans (Director of Operations, WhiteHat Security) — below are estimates of where I think state-of-art scanning technology has reached when speaking of relative averages. This is actually a harder exercise than you might think when all things are considered. I invite others to comment from their experience as well whether they agree or disagree. Should any of the scan-o-meter readings be higher or lower or anything I might not have considered?

Estimates are based on the completed automated results of a well-configured scanner, that gets a good website/web application crawl and is able to maintain login state during testing. We’ll also focus only on vulnerabilities that are exploitable by a remote-external attacker. In short, a best chance/case scenario.

A1 - Cross Site Scripting (XSS)

A2 - Injection Flaws

A3 - Malicious File Execution

A4 - Insecure Direct Object Reference

A5 - Cross Site Request Forgery (CSRF)

Identification of CSRF turns out to be fairly easy, filtering out which issues we care about is where automation falls down. That’s what this meter reflects, purely automated results.

A6 - Information Leakage and Improper Error Handling

A7 - Broken Authentication and Session Management

A8 - Insecure Cryptographic Storage

A9 - Insecure Communications

A scanners challenge with this issue is putting it in terms of business expectations. Perhaps a website does not need or want SSL is certain places and that’s OK.

A10 - Failure to Restrict URL Access

12 comments:

Anonymous said...: Hi Jeremiah,

Great post - as always, other than the content, I especially enjoy your graphics :-)

1. WRT to automated scanners' capabilities - I think there's another angle that's worth thinking about, and that is how scanners can help improve the manual testing process. We all understand that some things cannot be automated, mainly things that require human gray matter, but what about semi-automation? what if scanners could streamline the tedious manual work of an auditor?

What happens today, is that manual assessments rely heavily on HTTP proxies, NetCat and other "dumb" utilities (that require knowledge and experience) but what if scanners offered their technological advancements, in order to improve the user experience?

Personally, I put a lot of effort and thought on how we can improve automated scanners exactly in that place - aiding in the manual process, and removing some of the burden from the human pen tester.

2. Have you noticed that the new OWASP Top 10 list, is becoming a mixture of weaknesses/common mistakes AND vulnerabilities (e.g. XSS/XSRF).

3. Thanks for the cross-reference to my new blog :-); May 30, 2007 at 2:09 AM
Andy Steingruebl said...: Sure - you give me this graph after I purchase your service :)

But seriously - it is amazingly refreshing to see a vendor evaluate their product in a frank and straightforward manner. Imagine most vendors of security tools telling you they don't actually solve a problem. Does it ever happen?

To Ory's point - one thing I think is missed in the whole scanning tools game (and one I'm sure you all realize) is that in the end we don't necessarily need to find all instances of vulnerabilities to be successful - just all of the causes.

Once my scanner flags an area for me where I may have an issue I can go to the code and fix most/all instances of it without the scanner needing to have been exhaustive. Certainly this doesn't work in all cases, but where it does a full enumeration of issues isn't that helpful except insofar as it provides a regression test for future development work.

In terms of smart versus dumb tools though its always amazing how many defects you can find in code using grep and sed.... Maybe its a volume game. I just need to create a source code analyzer out of grep, package it and sell it for $5 and make it all up on volume.; May 30, 2007 at 7:49 AM
Anonymous said...: dood, lay off the nitrous, there, Turbo.

if a person who has spent a lot of time using both WI and AppScan (or works for WhiteHat and uses Sentinel) in environments where the scanners will find vulnerabilities - then that specific person will have some skill at separating the false positives, and possibly even turning a false positive or two into a true positive. more often than not - manual assessors nix the automated scanners because they are limited as a fault-injection tool (a1-a2), provide basic reporting and less analysis on a6 and a9, and only really shine when they find an a10 that happens to include PII. as a manual dynamic analysis vulnerability assessor, i'd assume that running a web application vulnerability scanner just to find a10's is the entire worth/value of these applications. in my mind, the other stuff is just fluff: a sea of false positives, and an unnecessary amount of false negatives.

i think you'd be hard pressed to argue that these products do almost anything for owasp 2007 top ten sections a5, a7, and a8. i would say that they mostly focus on a10, and do a very poor job at probably the most important area, a2. sure, sql injection may be covered - but metacharacter injection isn't as complete as you would find using an open-source tool guided by manual effort.

especially not as complete as you would find by doing static code analysis, especially automated tools like Fortify SCA combined with a test harness driven by fuzz testing that seeks 100% code coverage.

look at the cost as well. static code analysis costs about $2k per developer per year. to get the equivalent using a combined dynamic analysis method with both an automated scanner and a manual assessor would certainly be at least $2k/application/day.

@ Ory :
wrt your #2, it's not mitre cwe/cve at all... it's mitre capec (i.e. attacks, threats). thanks for having a blog, it will be interesting to see what you guys turn up; May 30, 2007 at 7:54 AM
Jeremiah Grossman said...: > Great post - as always, other than the content, I especially enjoy your graphics :-)

Thank you. Have to flex some photoshop skills every once in a while. :)

1) You’re absolutely right and something I’ve attempted to shed light on at a high level with the surveys (“who uses scanners on why questions”). The thing is, what exactly should we be measuring with the auditor-scanner combo? Probably time, but maybe also results. However, we’ll still need to understand what scanners can automate 100% of (or close to) and what they simply can't do at all (or very little). Then we'll be able figure out the gray area in between where we can save time and improve quality.

This is how we approach it at WhiteHat. We have to be intimately familiar with these numbers, as it applies to our process and technology, because it’s the keystone of our business model. Daily vulnerability assessments shows us exactly where we’re at, item #5 of How Sentinel Works describes this a bit, and what will save the most time on the most sites.

Circling back to your point, the challenge in this measurement between auditors and scanners in that no one/firm uses the same assessment methodology or views vulnerabilities in the same way, nor will they anytime soon. Because differentiating methodology is a big value proposition. Ultimately the real measurement we need is hackability. How much harder does any of this (VA/WAF/etc) make it for someone trying to break into a website. I mean, that’s the whole point isn’t it? I’m working on it.

2) Yep indeed. That kind of mixing an matching is so hard to get away from. Fortunately the list is not a "standard" taxonomy, but an education piece so it should be OK for that purpose.

3) Your very welcome. Keep the content coming.; May 30, 2007 at 8:06 AM
Jeremiah Grossman said...: Security Retentive,

> Sure - you give me this graph after I purchase your service :)

AHAHA, well, first thank you for the business. We appreciate it. And actually I wasn't specifically addressing the Sentinel Service, but purely the automated scanning portion with respect to our technology and my best guess at everyone else's. If I was to re-do the needles for Sentinel (w/ humans) they'd obviously push up higher. Nothing to 100% though, but we're trying. :)

> But seriously - it is amazingly refreshing to see a vendor evaluate their product in a frank and straightforward manner. Imagine most vendors of security tools telling you they don't actually solve a problem. Does it ever happen?

Thanks again. And I don't know the answer, I haven't been pitched a tool since 2000. :); May 30, 2007 at 8:14 AM
Andrew van der Stock said...: Hi Jeremiah,

Thank you for the very extensive comments in that area - I have don't use these tools often (although I am being slowly converted) so I don't know the state of the art.

I will work out a way to refresh the content (maybe call it 2.1 or something).

More soon,
Andrew; May 30, 2007 at 4:30 PM
Jeremiah Grossman said...: Very cool, thanks Andrew. 2.1 ETA?; May 30, 2007 at 5:01 PM
Anonymous said...: Are you actually suggesting that a scanner will find 90%+ of the XSS on a site? I clearly need better scanners, or perhaps PEBKAC.; May 30, 2007 at 5:38 PM
Jeremiah Grossman said...: Under a best case scenario, sure I think so. But Im actually more interested in your opinion. Where would you put the needle and why? If you could determine from experience, what were the gating items that seemed to prevent the scanners you use from yielding a better result?; May 30, 2007 at 6:28 PM
JRHelgeson said...: What about application firewalls? In every case I've encountered, I've seen a dramatic increase in site security before-and-after scans once a WAF had been put in place. The way I see it is that a programmer can integrate WAF functionality into their code, and hope they catch everything, or place a WAF in front and have it do the validation, revalidation for you. You can build it yourself, reinvent it every time, or reuse the code available from the WAF.
Am I wrong?; July 30, 2007 at 9:53 AM
Jeremiah Grossman said...: No your not wrong, at least in theory. The most challenge part for WAFs so far has been set-up and ongoing management. This could be why we see so few of them deployed, and the ones that are, are in alert mode as opposed to block mode. False positives are costly.; July 30, 2007 at 9:56 AM
Anonymous said...: Yes, you're right, scanners that report false positives are a pain in the rear.; November 4, 2007 at 4:40 PM