About

  • I have not really mastered this concept yet. mXSS is my introduction to reading the HTML specs. I will consider myself to somewhat understand the concept once I find a mXSS bug.
  • This page is mainly to park some important research in mXSS that I find very interesting.

Researches

Analysis tools

Nuggets

  • Chrome now encodes < and > characters in attributes (source). From the PR, it seems like this feature has not been pushed to all users yet. We can still enjoy mXSS for some time.
1
<svg><style><a alt="</style>">

alt attribute value will be HTML encoded, thus nerfing out some attacks

1
<svg><style><a alt="&lt;/style&gt;">
  • For the parsing differential payload below in the HTML specs, here is the mechanism of parsing this HTML snippet:

    • When you open a <form> tag, the parser needs to keep record of the fact that it was opened with a form element pointer (that’s how it’s called in the spec). If the pointer is not null, then form element cannot be created.
    • When you end a <form> tag, the form element pointer is always set to null.
1
<form id="outer"><div></form><form id="inner"><input>

Parsing in different namespaces

  • This is about the <style> element, but there are other elements that these explanation applies as well: <title>, <textarea>, <noscript> (if scripting is enabled). See the HTML specs for more details
  • The <style> element seems to be widely used in the payload of mXSS. I guess this is because it is “valid” in all 3 namespaces (?).

HTML namespace

  • In HTML (when served as text/html), the <style> element is defined as a raw text element. That means:
  • Raw text elements do not treat their content as HTML markup.
  • The parser does not look for nested tags inside them—it simply looks for the literal string that starts the closing tag (i.e. </style>).

SVG/MathML namespace

  • SVG and MathML content is served in XML MIME type, and there is NO raw text mode. Every element is parsed according to XML’s normal rules for element content.
  • <style> is not a valid element in these two namespaces. Hence <style> will be treated as any other tag and their contents will be parsed as normal HTML (in other words, normal elements like <a>).
  • This means that all elements must be properly nested, and attribute values are parsed as strings without special “raw” behavior.

Resulting quirks

  • These are some quirks that leverages the behavior above to deliver a mXSS payload. This is usually at the final step after we have figured out a good mutation to use.

  • Comments is interpreted differently in <style> of MathML namespace and HTML namespace:

MathML namespace: The parser sees the opening comment tag <!-- after the <style> tag and parse until it sees the closing comment tag -->. <img> is foreign content, hence it breaks out of the MathML namespace to HTML namespace

1
<math><style><!--</style>a<foo-bar is="--><img src=x onerror=alert(1)>">

HTML namespace: Now it is slightly different, the <style> tag content is treated as raw text, hence the opening comment tag <!--> is treated as raw text, not a HTML element. The parser consumes everything until the closing </style> tag, hence the <foo-bar> element is considered a normal element with is attribute set to the rest of the payload, until the closing ".

1
<style><!--</style>a<foo-bar is="--><img src=x onerror=alert(1)>">
  • Similarly, the way that SVG namespace and HTML namespace parse attributes are different too.

HTML namespace: Same behavior as the above. <a id=" is considered as raw text, and this “breaks” the <a> tag. Hence, the <img> tag is treated as the normal HTML tag.

1
<style><a id="</style><img src=x onerror=alert()>"></a></style>

SVG namespace: In here, the parser sees the <style> tag, then the nested <a> tag inside, with id set to the rest of the payload.

1
<svg><style><a id="</style><img src=x onerror=alert()>"></a></style>