如何使用Python / Django执行HTML解码/编码？

一尘不染

如何使用Python / Django执行HTML解码/编码？

django

我有一个HTML编码的字符串：

'''&lt;img class=&quot;size-medium wp-image-113&quot;\
 style=&quot;margin-left: 15px;&quot; title=&quot;su1&quot;\
 src=&quot;http://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg&quot;\
 alt=&quot;&quot; width=&quot;300&quot; height=&quot;194&quot; /&gt;'''

我想将其更改为：

<img class="size-medium wp-image-113" style="margin-left: 15px;" 
  title="su1" src="http://blah.org/wp-content/uploads/2008/10/su1-300x194.jpg" 
  alt="" width="300" height="194" />

我希望将其注册为HTML，以便浏览器将其呈现为图像，而不是显示为文本。

字符串的存储方式是这样的，因为我正在使用一种名为的网络抓取工具BeautifulSoup，它将“扫描”网页并从中获取某些内容，然后以该格式返回字符串。

我已经找到了如何在C＃中而不是在Python中执行此操作。有人可以帮我吗？

阅读 388

2020-03-26

一尘不染

给定Django用例，对此有两个答案。这是它的django.utils.html.escape功能，以供参考：

def escape(html):
    """Returns the given HTML with ampersands, quotes and carets encoded."""
    return mark_safe(force_unicode(html).replace('&', '&amp;').replace('<', '&l
t;').replace('>', '&gt;').replace('"', '&quot;').replace("'", '&#39;'))

为了解决这个问题，Jake的答案中描述的Cheetah函数应该起作用，但是缺少单引号。此版本包含更新的元组，并且替换顺序相反，以避免出现对称问题：

def html_decode(s):
    """
    Returns the ASCII decoded version of the given HTML string. This does
    NOT remove normal HTML tags like <p>.
    """
    htmlCodes = (
            ("'", '&#39;'),
            ('"', '&quot;'),
            ('>', '&gt;'),
            ('<', '&lt;'),
            ('&', '&amp;')
        )
    for code in htmlCodes:
        s = s.replace(code[1], code[0])
    return s

unescaped = html_decode(my_string)

但是，这不是一般的解决方案。仅适用于以编码的字符串django.utils.html.escape。更笼统地说，坚持使用标准库是一个好主意：

# Python 2.x:
import HTMLParser
html_parser = HTMLParser.HTMLParser()
unescaped = html_parser.unescape(my_string)

# Python 3.x:
import html.parser
html_parser = html.parser.HTMLParser()
unescaped = html_parser.unescape(my_string)

# >= Python 3.5:
from html import unescape
unescaped = unescape(my_string)

建议：将未转义的HTML存储在数据库中可能更有意义。如果可能的话，值得一探的是从BeautifulSoup获得未转义的结果，并完全避免此过程。

对于Django，转义仅在模板渲染期间发生；因此，为了防止转义，您只需告诉模板引擎不要转义您的字符串即可。为此，请在模板中使用以下选项之一：

{{ context_var|safe }}
{% autoescape off %}
    {{ context_var }}
{% endautoescape %}

2020-03-26

一尘不染

使用标准库：

HTML转义

try:
    from html import escape  # python 3.x
except ImportError:
    from cgi import escape  # python 2.x

print(escape("<"))

HTML转义

try:
    from html import unescape  # python 3.4+
except ImportError:
    try:
        from html.parser import HTMLParser  # python 3.x (<3.4)
    except ImportError:
        from HTMLParser import HTMLParser  # python 2.x
    unescape = HTMLParser().unescape

print(unescape("&gt;"))

2020-03-26

如何使用Python / Django执行HTML解码/编码？

共2个答案