Jeremiah Grossman: Results, Unicode Left/Right Pointing Double Angel Quotation Mark

Thursday, June 04, 2009

Results, Unicode Left/Right Pointing Double Angel Quotation Mark

A while back 3APA3A and Arian Evans (Director of Operations, WhiteHat Security) left off a full-disclosure thread about an interesting encoding bypass attack, Unicode Left/Right Pointing Double Angel Quotation Mark.

Dear full-disclosurelists.grok.org.uk,

By the way: I saw Unicode Left Pointing Double Angel Quotation Mark (%u00AB) / Unicode Right Pointing Double Angel Quotation Mark (%u00BB) are sometimes translated to '<' and '>'. Does somebody experimented with

%u00ABscript%u00BB

in different environments to bypass filtering in this way?

Arian promised to get back to 3APA3A after scanning several hundred production websites using WhiteHat Sentinel. A huge R&D benefit of the platform. Two years later there is data to share. We’ve been busy, but hey, better late the never right? :) As it turned out 3APA3A was correct! Arian discovered a small number of Web applications vulnerable to the encoding technique and they add up if the sample pool is large enough. Samples ranging from 300 to roughly 1000 websites. Remember these are collapsed numbers. Meaning multiple vulnerability inputs on the same Web application are grouped together.

11 exploitable XSS in 8 websites:
%u00ABscript%u00BB

15 exploitable XSS in 12 sites:
&#x3008 ;script&#x3009 ;

2 in 2:
U%2bFF1CscriptU%2bFF1E

1 in 1:
&#x2039 ;script&#x203A ;

1 in 1:
&#x2329 ;script&#x232A

1 in 1:
&#x27E8 ;script&#x27E9 ;

*whitespace before semi-colons are added purposely to prevent formatting blog formatting glitches.

Arian Evans, in his own words...

These are exploitable conditions where this was the ONLY way that arbitrary HTML could be created. There were are many more sites that normalized these and the same encoding could be used for filter-evasion/exploitation, but they were not the ONLY way to create arbitrary HTML in the application. Unfortunately the dataset does not count all of the ANDs/combinations right now, just the ONLYs. So if there was a simpler way to create arbitrary HTML, that is the only way it was counted. The rabbit hole goes much deeper. Dozens of combinations and permutations that lead to exploitation and not just for XSS. For many types of syntax-attacks. Still researching.

There are also MANY more of these in international language code pages. Browser behavior gets really unpredictable with foreign-language character sets which increases XSS and HTTP/RS exploit options even more. There are also many more ways to use these when you start layering your encoding techniques. Yosuke Hasegawa did a great presentation on Japanese/Kanji character sets @ BlackHat Tokyo 2008. For example I found many of these attack vectors work at an even higher percentage when URI-escaped or combined with other Hex-encoding formats (or Decimal, Base64, etc. etc. etc.).

3APA3A, thanks for opening my mind up to some new angles on filter-evasion tricks! :)

6 comments:

Anonymous said...: perhaps a better question for you Jeremiah, is your waf protecting against this and other variants? My guess is no.; June 4, 2009 at 5:00 PM
Jeremiah Grossman said...: and apparently neither is the webapp code.; June 4, 2009 at 5:03 PM
Unknown said...: You always get more, than expected when you call for angles. BTW, we are namesakes, I own whitehatsecurity.ru since 2006.; June 5, 2009 at 1:15 AM
Łukasz Pilorz said...: I think this was already discussed in context of full-width/half-width and similar conversion issues. This table could be helpful: http://lukasz.pilorz.net/testy/unicode_conversion/; June 5, 2009 at 7:50 AM
Matt Presson said...: I have also discovered similar problems with applications running on the WebLogic platform. I wrote a post about it a few months ago.

http://coding-insecurity.blogspot.com/2008/10/executing-scripts-with-non-english.html

and

http://coding-insecurity.blogspot.com/2008/10/more-on-scripts-with-non-english.html; June 5, 2009 at 1:41 PM
Arian said...: @Anonymous -- while it is clear you are trolling, you are incorrect. These are easy data structures for WAFs to block.

If you spend some time learning about business systems software, you will find that data type that looks like:

&#x[NumHex]; or &#x[alphaNum];

are almost never structures that are legitimate business data type requirements.

It is likewise pretty easy to clobber them on input.

That is why I explicitly noted that use of other encodings as transport mechanisms can sometimes render these attacks more reliable. Cheers.; June 5, 2009 at 3:24 PM