Unavailable

This livestream is restricted

Already a member? Login with your membership email address

Mitesh Ashar

@iMBA

Kiran Jonnalagadda

Kiran Jonnalagadda

@jace

Testing HTML Mode with our HTML Sanitizer

Submitted Jan 3, 2023

Testing some inline tags

OK. Will this process as an inline block with a link to <a href=“https://www.hasgeek.com” title=“Hasgeek” target=“_blank” rel=“nofollow” class=“hello”>Hasgeek</a>? The class should not get through. <a href=“javascript:alert(‘Hello’)”>Test a Javascript link</a>1

Let’s talk <abbr title=“Hypertext Transfer Protocol”>HTTP</abbr>.

<a href=“https://developer.mozilla.org/en-US/docs/Web/HTTP” title=“Get more information about HTTP” target=“_blank” rel=“nofollow”>What is <abbr title=“Hypertext Transfer Protocol”>HTTP</abbr>?</a>

The subsequent text chunks are <b>bold <i>italics</i></b> and <strong>strong <em>emphasized</em></strong>.

Testing line break<br>and another line break<br/> to see whether they work.

<style>
</style>

<script type=“text/javascript”>
window.alert(‘Hello World!’);
</script>

The above HTML chunk with the style and script tags has an issue. 2

Blockquotes

<blockquote class=“abcd” cite=“http://www.worldwildlife.org/who/index.html”>This is a blockquote with an <id> which should get escaped and a link https://www.hasgeek.com that should get linkified. The class on the tag should not get through. Please note that markdown syntax does not work here.</blockquote> Not even outside! This will convert to bare text, not even enclosed within a <code><p> tag</code>. Which is why, it does not have a bottom margin.

Here’s a markdown blockquote where we will cite <cite>H. G. Wells</cite>.
You may notice here, that markdown gets processed for inline HTML.
This is because the HTML in these tokens is broken into an opening tag and closing tags of type html_inline. And the content within is of type text, available for further tokenisation.
Whereas, for html_block tokens for HTML blocks, there is a single

Test <code> and <pre>

Test code block

<pre>
<code class=“language-javascript”>
var x = 10;
</code></pre>

<code class=“language-javascript”>
var x = 10;
</code>

<code class=“language-javascript”>var x = 10;</code>

The above HTML code block has spacing issues. 3

var x = 10;

<pre>
This is a
test <del>paragraph</del> <ins>pre-formatted block</ins>.
</pre>

<hr>

Description Lists

<dl>
<dt>Coffee</dt>
<dd>Black hot drink</dd>
<dt>Milk</dt>
<dd>White cold drink</dd>
</dl>

<hr/>

Headings

<h1>Heading 1</h1>
<h2>Heading 2</h2>

Headings allowlist4

<h3>Heading 3</h3>
<h4>Heading 4</h4>
<h5>Heading 5</h5>
<h6>Heading 6</h6>

Image, Lists and unknown tags

<img src=“/static/img/hg-logo.svg” width=“80” height=“80” align=“right” alt=“Hasgeek”/> Check whether right alignment has been applied to the image.

<ul>
<li>Item</li>
<li>Item</li>
</ul>

<ol>
<li>Item</li>
<li>Item</li>
</ol>

<ol start=“9”>
<li>Item</li>
<li>
Item
<ol type=“a”>
<li>Item</li>
<li>Item</li>
</ol>
</li>
<li>
Item
<ol type=“i” start=“7”>
<li>Item</li>
<li>Item</li>
</ol>
</li>
</ol>

<id>ID</id>

<role> is specified

Tables

<table class=“tg” align=“left” bgcolor=“gray” border=“1” cellpadding=“2” cellspacing=“2” width=“90%”>
<caption>Table caption</caption>
<thead>
<tr>
<td class=“tg-0lax”>Testing<sup>Creating longer superscripts</sup></td>
<td class=“tg-0lax”>Testing more<sub>Creating longer subscripts</sub></td>
<td>NowIAmTryingToFitInAnUnusuallyLongWordHereToSeeWhetherItWraps</td>
</tr>
</thead>
</table>

<article>

<article>
<h3>Mozilla Firefox</h3>
<p>Mozilla Firefox is an open-source web browser developed by Mozilla. Firefox has been the second most popular web browser since January, 2018.</p>
</article>

<article>
<h3>Microsoft Edge</h3>
<p>Microsoft Edge is a web browser developed by Microsoft, released in 2015. Microsoft Edge replaced Internet Explorer.</p>
</article>

<aside>

<aside>
<h4>Epcot Center</h4>
<p>Epcot is a theme park at Walt Disney World Resort featuring exciting attractions, international pavilions, award-winning fireworks and seasonal special events.</p>
</aside>

<details>

<details>
<summary>Epcot Center</summary>
<p>Epcot is a theme park at Walt Disney World Resort featuring exciting attractions, international pavilions, award-winning fireworks and seasonal special events.</p>
</details>

<figure>

<figure>
<img src=“/static/img/hg-logo.svg” width=“50%” alt=“Hasgeek” align=“center”>
<figcaption>Hasgeek</figcaption>
</figure>

Other issues identified

  • Inline <code> tags do not have the same formatting as the block ones. This needs to be normalised.

  1. All attribute URLs need to be cleaned using linkify. ↩︎

  2. No <p> tag, Newlines consumed
    The cause:
    These are tags allowed by markdown-it-py. The whole chunk above is treated as 1 html_block token. After sanitisation, it is still here without a <p> tag. Newlines have been consumed by bleach. ↩︎

  3. Newline consumption is happening in bleach, except inside <code> and <pre>. For both, the newlines at the start and end need to be trimmed. ↩︎

  4. <h1> and <h2> were not in coaster’s list. Should we allow them? ↩︎

Comments

{{ gettext('Login to leave a comment') }}

{{ gettext('Post a comment…') }}
{{ gettext('New comment') }}
{{ formTitle }}

{{ errorMsg }}

{{ gettext('No comments posted yet') }}

Hosted by

Conversations build community. Bring the conversations to your community by collaborating with others who share your passion.

Supported by