About
- I have not really mastered this concept yet. mXSS is my introduction to reading the HTML specs. I will consider myself to somewhat understand the concept once I find a mXSS bug.
- This page is mainly to park some important research in mXSS that I find very interesting.
Researches
- S1r1us mXSS Explained series: Part 1 and Part 2. His Github MXSS repository also contains a lot of insights.
- mXSS cheatsheet (to save some sanity reading HTML specs)
- SecurityMB DOMPurify 2.0.17 bypass
- Yaniv Nizry mXSS introduction research
- Yaniv Nizry DOMPurify 3.2.1 Bypass (Non-Default Config)
- Jorian Woltjer mXSS: Covered some of the basics, including my content below, as well as the
is
attribute trick. - Kevin Mizu’s analysis on DOMPurify Bypasses: Part 1, Part 2
- Ensy DOMPurify 3.2.3 Bypass (Non-Default Config)
- Helping secure DOMPurify
Analysis tools
Nuggets
- Chrome now encodes
<
and>
characters in attributes (source). From the PR, it seems like this feature has not been pushed to all users yet. We can still enjoy mXSS for some time.
|
|
alt
attribute value will be HTML encoded, thus nerfing out some attacks
|
|
-
For the parsing differential payload below in the HTML specs, here is the mechanism of parsing this HTML snippet:
- When you open a
<form>
tag, the parser needs to keep record of the fact that it was opened with a form element pointer (that’s how it’s called in the spec). If the pointer is not null, thenform
element cannot be created. - When you end a
<form>
tag, the form element pointer is always set to null.
- When you open a
|
|
Parsing in different namespaces
- This is about the
<style>
element, but there are other elements that these explanation applies as well:<title>
,<textarea>
,<noscript>
(if scripting is enabled). See the HTML specs for more details - The
<style>
element seems to be widely used in the payload of mXSS. I guess this is because it is “valid” in all 3 namespaces (?).
HTML namespace
- In HTML (when served as
text/html
), the<style>
element is defined as a raw text element. That means: - Raw text elements do not treat their content as HTML markup.
- The parser does not look for nested tags inside them—it simply looks for the literal string that starts the closing tag (i.e.
</style>
).
SVG/MathML namespace
- SVG and MathML content is served in XML MIME type, and there is NO raw text mode. Every element is parsed according to XML’s normal rules for element content.
<style>
is not a valid element in these two namespaces. Hence<style>
will be treated as any other tag and their contents will be parsed as normal HTML (in other words, normal elements like<a>
).- This means that all elements must be properly nested, and attribute values are parsed as strings without special “raw” behavior.
Resulting quirks
-
These are some quirks that leverages the behavior above to deliver a mXSS payload. This is usually at the final step after we have figured out a good mutation to use.
-
Comments is interpreted differently in
<style>
of MathML namespace and HTML namespace:
MathML namespace: The parser sees the opening comment tag <!--
after the <style>
tag and parse until it sees the closing comment tag -->
. <img>
is foreign content, hence it breaks out of the MathML namespace to HTML namespace
|
|
HTML namespace: Now it is slightly different, the <style>
tag content is treated as raw text, hence the opening comment tag <!-->
is treated as raw text, not a HTML element. The parser consumes everything until the closing </style>
tag, hence the <foo-bar>
element is considered a normal element with is
attribute set to the rest of the payload, until the closing "
.
|
|
- Similarly, the way that SVG namespace and HTML namespace parse attributes are different too.
HTML namespace: Same behavior as the above. <a id="
is considered as raw text, and this “breaks” the <a>
tag. Hence, the <img>
tag is treated as the normal HTML tag.
|
|
SVG namespace: In here, the parser sees the <style>
tag, then the nested <a>
tag inside, with id
set to the rest of the payload.
|
|