scrapy爬虫使用undetected_chromedriver登录总是失败

小能豆

scrapy爬虫使用undetected_chromedriver登录总是失败

scrapy

初学者，有没有大佬能解答一下，万分感谢

C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\python.exe -X pycache_prefix=C:\Users\JJJhr_\AppData\Local\JetBrains\PyCharm2023.3\cpython-cache “D:/PyCharm 2023.3.2/plugins/python/helpers/pydev/pydevd.py” –multiprocess –qt-support=auto –client 127.0.0.1 –port 63728 –file F:\PythonProject\ArticleSpider\main.py
Connected to pydev debugger (build 233.13135.95)
2024-04-28 02:08:29 [scrapy.utils.log] INFO: Scrapy 2.11.1 started (bot: ArticleSpider)
2024-04-28 02:08:29 [scrapy.utils.log] INFO: Versions: lxml 5.2.1.0, libxml2 2.11.7, cssselect 1.2.0, parsel 1.9.1, w3lib 2.1.2, Twisted 24.3.0, Python 3.11.7 (tags/v3.11.7:fa7a6f2, Dec 4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)], pyOpenSSL 24.1.0 (OpenSSL 3.2.1 30 Jan 2024), cryptography 42.0.5, Platform Windows-10-10.0.19045-SP0
2024-04-28 02:08:30 [scrapy.addons] INFO: Enabled addons:
[]
2024-04-28 02:08:30 [asyncio] DEBUG: Using selector: SelectSelector
2024-04-28 02:08:30 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.asyncioreactor.AsyncioSelectorReactor
2024-04-28 02:08:30 [scrapy.utils.log] DEBUG: Using asyncio event loop: asyncio.windows_events.WindowsSelectorEventLoop
2024-04-28 02:08:30 [scrapy.extensions.telnet] INFO: Telnet Password: 69dbb14f7af72e05
2024-04-28 02:08:30 [scrapy.middleware] INFO: Enabled extensions:
[‘scrapy.extensions.corestats.CoreStats’,
‘scrapy.extensions.telnet.TelnetConsole’,
‘scrapy.extensions.logstats.LogStats’]
2024-04-28 02:08:30 [scrapy.crawler] INFO: Overridden settings:
{‘BOT_NAME’: ‘ArticleSpider’,
‘FEED_EXPORT_ENCODING’: ‘utf-8’,
‘NEWSPIDER_MODULE’: ‘ArticleSpider.spiders’,
‘REQUEST_FINGERPRINTER_IMPLEMENTATION’: ‘2.7’,
‘SPIDER_MODULES’: [‘ArticleSpider.spiders’],
‘TWISTED_REACTOR’: ‘twisted.internet.asyncioreactor.AsyncioSelectorReactor’}
2024-04-28 02:08:30 [scrapy.middleware] INFO: Enabled downloader middlewares:
[‘scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware’,
‘scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware’,
‘scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware’,
‘scrapy.downloadermiddlewares.useragent.UserAgentMiddleware’,
‘scrapy.downloadermiddlewares.retry.RetryMiddleware’,
‘scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware’,
‘scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware’,
‘scrapy.downloadermiddlewares.redirect.RedirectMiddleware’,
‘scrapy.downloadermiddlewares.cookies.CookiesMiddleware’,
‘scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware’,
‘scrapy.downloadermiddlewares.stats.DownloaderStats’]
2024-04-28 02:08:30 [scrapy.middleware] INFO: Enabled spider middlewares:
[‘scrapy.spidermiddlewares.httperror.HttpErrorMiddleware’,
‘scrapy.spidermiddlewares.offsite.OffsiteMiddleware’,
‘scrapy.spidermiddlewares.referer.RefererMiddleware’,
‘scrapy.spidermiddlewares.urllength.UrlLengthMiddleware’,
‘scrapy.spidermiddlewares.depth.DepthMiddleware’]
2024-04-28 02:08:30 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2024-04-28 02:08:30 [scrapy.core.engine] INFO: Spider opened
2024-04-28 02:08:30 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-04-28 02:08:30 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-04-28 02:08:30 [undetected_chromedriver.patcher] DEBUG: getting release number from /last-known-good-versions-with-downloads.json
2024-04-28 02:08:31 [undetected_chromedriver.patcher] DEBUG: downloading from https://storage.googleapis.com/chrome-for-testing-public/124.0.6367.91/win32/chromedriver-win32.zip
2024-04-28 02:08:32 [scrapy.core.engine] ERROR: Error while obtaining start requests
Traceback (most recent call last):
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py”, line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File “C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\http\client.py”, line 1294, in request
self.send_request(method, url, body, headers, encode_chunked)
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\http\client.py”, line 1340, in send_request
self.endheaders(body, encode_chunked=encode_chunked)
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\http\client.py”, line 1289, in endheaders
self.send_output(message_body, encode_chunked=encode_chunked)
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\http\client.py”, line 1048, in send_output
self.send(msg)
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\http\client.py”, line 986, in send
self.connect()
File “C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\http\client.py”, line 1466, in connect
self.sock = self.context.wrap_socket(self.sock,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\ssl.py”, line 517, in wrap_socket
return self.sslsocket_class.create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\ssl.py”, line 1108, in create
self.do_handshake()
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\ssl.py”, line 1383, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\core\engine.py”, line 181, in next_request
request = next(self.slot.start_requests)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “F:\PythonProject\ArticleSpider\ArticleSpider\spiders\jobbole.py”, line 24, in start_requests
browser = uc.Chrome()
^^^^^^^^^^^
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\site-packages\undetected_chromedriver_init.py”, line 258, in init
self.patcher.auto()
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\site-packages\undetected_chromedriver\patcher.py”, line 178, in auto
self.unzip_package(self.fetch_package())
^^^^^^^^^^^^^^^^^^^^
File “C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\site-packages\undetected_chromedriver\patcher.py”, line 287, in fetch_package
return urlretrieve(download_url)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py”, line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
^^^^^^^^^^^^^^^^^^
File “C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py”, line 216, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\JJJhr_\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py”, line 519, in open
response = self.open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py”, line 536, in open
result = self.*call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\JJJhr*\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py”, line 496, in *call_chain
result = func(*args)
^^^^^^^^^^^
File “C:\Users\JJJhr*\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py”, line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\JJJhr\AppData\Local\Programs\Python\Python311\Lib\urllib\request.py”, line 1351, in do_open
raise URLError(err)
urllib.error.URLError:
2024-04-28 02:08:32 [scrapy.core.engine] INFO: Closing spider (finished)
2024-04-28 02:08:32 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{‘elapsed_time_seconds’: 1.282641,
‘finish_reason’: ‘finished’,
‘finish_time’: datetime.datetime(2024, 4, 27, 18, 8, 32, 71424, tzinfo=datetime.timezone.utc),
‘log_count/DEBUG’: 5,
‘log_count/ERROR’: 1,
‘log_count/INFO’: 10,
‘start_time’: datetime.datetime(2024, 4, 27, 18, 8, 30, 788783, tzinfo=datetime.timezone.utc)}
2024-04-28 02:08:32 [scrapy.core.engine] INFO: Spider closed (finished)

Process finished with exit code 0

阅读 47

2024-06-01

共1个答案

小能豆

这段日志看起来像是在运行一个 Scrapy 项目，并且出现了一些问题。让我帮你分析一下。

从日志中可以看到以下关键信息：

Scrapy 版本是 2.11.1，Python 版本是 3.11.7。
有一些 Scrapy 相关的调试信息，包括启用的中间件和已启动的爬虫。
在获取起始请求（start requests）时发生了错误。
错误的堆栈跟踪显示了异常，涉及到 urllib、undetected_chromedriver 和您自己的 Spider。

根据错误堆栈的信息，看起来是 urllib 抛出了一个 URLError，错误代码是 WinError 10054，表示远程主机强制关闭了连接。这可能是由于网络问题或远程服务器的配置问题引起的。

此外，日志还提到了您的 Spider 中的某些内容，特别是在调用 uc.Chrome() 时。根据堆栈跟踪，这似乎是在尝试使用 Chrome 进行 Web 自动化，但由于一些原因（如网络连接问题或浏览器驱动程序未被正确配置），导致无法成功启动浏览器。

解决这个问题的方法可能包括：

检查网络连接是否正常，确保您的网络环境能够访问所需的资源。
检查浏览器驱动程序是否正确安装和配置。如果您正在使用 Chrome 或 Firefox 等浏览器，需要下载相应的浏览器驱动程序，并将其路径配置到您的项目中。
尝试重新运行代码，看看问题是否会自行解决。有时网络问题可能只是暂时的。

如果您能提供更多关于问题的上下文信息或代码示例，我可以提供更具体的帮助。

2024-06-01