我正在尝试找出从 clutch.io 收集数据的最简单方法
看这里
from bs4 import BeautifulSoup from selenium import webdriver driver = webdriver.Chrome() url = 'https://clutch.co/it-services/msp' driver.get(url=url) soup = BeautifulSoup(driver.page_source,"lxml") links = [] for l in soup.find_all('li',class_='website-link website-link-a'): results = (l.a.get('href')) links.append(results) print(links, "\n", "Count links - ", len(links))
抛回此错误:
--------------------------------------------------------------------------- WebDriverException Traceback (most recent call last) <ipython-input-4-4f37092106f4> in <cell line: 4>() 2 from selenium import webdriver 3 ----> 4 driver = webdriver.Chrome() 5 6 url = 'https://clutch.co/it-services/msp' 5 frames /usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response) 243 alert_text = value["alert"].get("text") 244 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here --> 245 raise exception_class(message, screen, stacktrace) WebDriverException: Message: unknown error: cannot find Chrome binary Stacktrace: #0 0x55a6ebf424e3 <unknown> #1 0x55a6ebc71c76 <unknown> #2 0x55a6ebc98757 <unknown> #3 0x55a6ebc97029 <unknown> #4 0x55a6ebcd5ccc <unknown> #5 0x55a6ebcd547f <unknown> #6 0x55a6ebcccde3 <unknown> #7 0x55a6ebca22dd <unknown> #8 0x55a6ebca334e <unknown> #9 0x55a6ebf023e4 <unknown> #10 0x55a6ebf063d7 <unknown> #11 0x55a6ebf10b20 <unknown> #12 0x55a6ebf07023 <unknown> #13 0x55a6ebed51aa <unknown> #14 0x55a6ebf2b6b8 <unknown> #15 0x55a6ebf2b847 <unknown> #16 0x55a6ebf3b243 <unknown> #17 0x7ffb30c27609 start_thread
如何解决!
您遇到的错误是由于找不到Chrome二进制文件引起的。您可以尝试以下解决方案来解决这个问题:
webdriver.Chrome()
executable_path
driver = webdriver.Chrome(executable_path='/path/to/chrome')
请将/path/to/chrome替换为您系统上Chrome浏览器二进制文件的实际路径。
/path/to/chrome
请注意,如果您使用的是基于云的环境(例如Colab或Jupyter Notebook),可能需要特殊配置才能使用WebDriver。在这种情况下,您可能需要查阅相关文档或联系提供商以获取更多帮助。
希望这些解决方案能够帮助您成功收集clutch.io网站的数据!