SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org

小能豆

SSL: CERTIFICATE_VERIFY_FAILED error for http://en.wikipedia.org

python

我正在练习“Web Scraping with Python”中的代码，但我一直遇到这个证书问题：

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
import re

pages = set()
def getLinks(pageUrl):
    global pages
    html = urlopen("http://en.wikipedia.org"+pageUrl)
    bsObj = BeautifulSoup(html)
    for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages:
                #We have encountered a new page
                newPage = link.attrs['href'] 
                print(newPage) 
                pages.add(newPage) 
                getLinks(newPage)
getLinks("")

错误是：

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1049)>

顺便说一句，我也在练习scrapy，但一直遇到问题：command not found: scrapy（我在网上尝试了各种解决方案，但都没有用……真的很沮丧）

阅读 105

2023-06-13

共1个答案

小能豆

对于第一个问题，你遇到了 SSL 证书验证失败的错误。这是由于 Python 默认会尝试验证 HTTPS 连接的 SSL 证书，而在某些情况下，可能无法成功验证证书。

为了解决这个问题，你可以尝试禁用 SSL 证书验证。在 urlopen() 方法中，通过设置 context 参数可以实现禁用验证。下面是修改后的代码示例：

import ssl
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

pages = set()

def getLinks(pageUrl):
    global pages

    # Create an SSL context with certificate verification disabled
    context = ssl._create_unverified_context()

    html = urlopen("https://en.wikipedia.org"+pageUrl, context=context)
    bsObj = BeautifulSoup(html, 'html.parser')

    for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages:
                newPage = link.attrs['href']
                print(newPage)
                pages.add(newPage)
                getLinks(newPage)

getLinks("")

在上面的代码中，我们创建了一个禁用证书验证的 SSL 上下文 context，然后将其传递给 urlopen() 方法的 context 参数。

对于第二个问题，command not found: scrapy，这可能是由于 Scrapy 框架没有正确安装或配置导致的。确保你已经正确安装了 Scrapy，并且其可执行文件路径已经包含在系统的 PATH 环境变量中。

如果你已经安装了 Scrapy，但仍然遇到该问题，可以尝试以下步骤：

确认 Scrapy 是否已成功安装：运行 pip show scrapy 命令，确保能够看到关于 Scrapy 的信息。
检查 Scrapy 可执行文件的路径：运行 which scrapy 命令，检查返回的路径是否正确。如果返回空白或找不到路径，则表示 Scrapy 可能未正确安装或未配置 PATH 环境变量。
检查 Python 版本：确保你在命令行中使用的是正确的 Python 版本，如果有多个 Python 版本安装在你的系统中，可能会导致混淆。
尝试重新安装 Scrapy：运行 pip uninstall scrapy 命令卸载 Scrapy，然后重新运行 pip install scrapy 命令重新安装。

如果问题仍然存在，你可以提供更多关于你的操作系统、Python 版本和安装环境的信息，以便更详细地诊断和解决问题。

2023-06-13