Python 中的 Url 解码 UTF-8

小能豆

Python 中的 Url 解码 UTF-8

javascript

在 Python 2.7 中，给定一个像的 URL example.com?title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0，我如何将其解码为预期结果example.com?title==правовая+защита？

我试过了url=urllib.unquote(url.encode(“utf8”))，但似乎给出了错误的结果。

阅读 60

2024-09-02

共1个答案

小能豆

为了在 Python 2.7 中正确解码 URL 编码的字符串，尤其是在处理非 ASCII 字符（如西里尔字母）时，您应该urllib.unquote结合使用.decode('utf8')来

方法如下

import urllib

url = 

u
'example.com?title=%D0%BF%D1%80%D0%B0%D0%B2%D0%BE%D0%B2%D0%B0%D1%8F+%D0%B7%D0%B0%D1%89%D0%B8%D1%82%D0%B0'

# Decode the URL-encoded string and then decode the UTF-8 bytes into a Unicode string
decoded_url = urllib.unquote(url).decode('utf8')

print(decoded_url)

解释

urllib.unquote(url)%xx：通过将转义符替换为相应字符来解码 URL 编码的字符串。
.decode('utf8')：将生成的 UTF-8 字节解码为 Unicode 字符串，确保非 ASCII 字符（如西里尔字母）能够正确显示。

预期输出

example.com?title=правовая+защита

这种方法可确保 URL 被正确解码，并且非 ASCII 字符按预期显示。

2024-09-02