如何使用 Python 读取 URL 的内容？

小能豆

如何使用 Python 读取 URL 的内容？

阅读 63

2024-09-25

共1个答案

小能豆

在 Python 中读取 URL 内容可以使用标准库中的 urllib 模块。以下是使用 urllib 读取 URL 内容的示例，分别针对 Python 2 和 Python 3。

Python 3

在 Python 3 中，使用 urllib.request 来处理 URL 请求：

import urllib.request

# 定义 URL
url = 'http://www.example.com'

# 打开 URL 并读取内容
with urllib.request.urlopen(url) as response:
    content = response.read().decode('utf-8')  # 读取并解码为字符串
    print(content)

Python 2

在 Python 2 中，使用 urllib 或 urllib2 模块：

import urllib

# 定义 URL
url = 'http://www.example.com'

# 打开 URL 并读取内容
response = urllib.urlopen(url)
content = response.read()  # 读取内容
print(content)

使用 `requests` 库（推荐）

无论是 Python 2 还是 3，requests 库都提供了更为简单和直观的 API 来进行 HTTP 请求。首先需要安装 requests 库：

pip install requests

然后使用以下代码读取 URL 内容：

import requests

# 定义 URL
url = 'http://www.example.com'

# 发送 GET 请求并读取响应内容
response = requests.get(url)
content = response.text  # 将内容转换为字符串
print(content)

处理常见问题

添加请求头：有些网站可能会阻止没有适当请求头的请求，特别是 User-Agent。可以使用 requests 库添加自定义头：

```python
import requests

url = 'http://www.example.com‘
headers = {
‘User-Agent’: ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36’
}

response = requests.get(url, headers=headers)
print(response.text)
```
处理编码问题：有些网页可能会返回非 UTF-8 编码的数据。可以使用 requests 库自动处理编码：

```python
import requests

url = 'http://www.example.com‘
response = requests.get(url)
response.encoding = response.apparent_encoding # 自动检测编码
print(response.text)
```
处理异常情况：添加异常处理来应对网络错误或请求失败的情况：

```python
import requests

url = 'http://www.example.com‘

try:
response = requests.get(url)
response.raise_for_status() # 如果请求失败，抛出异常
print(response.text)
except requests.exceptions.RequestException as e:
print(f”请求失败：{e}”)
```

通过这些方法，可以灵活地读取和处理 URL 的内容。

2024-09-25

如何使用 Python 读取 URL 的内容？

共1个答案

Python 3

Python 2

使用 requests 库（推荐）

处理常见问题

使用 `requests` 库（推荐）