几天前我在这里问了一个关于多处理的问题,一个用户给我发了下面这个答案。唯一的问题是这个答案在他的计算机上有效,但在我的计算机上无效。
我在 Windows(Python 3.6)和 Mac(Python 3.8)上尝试过。我在安装时附带的基本 Python IDLE、Windows 上的 PyCharm 和 Jupyter Notebook 上运行了代码,但什么也没发生。我有 32 位 Python。这是代码:
from bs4 import BeautifulSoup import requests from datetime import date, timedelta from multiprocessing import Pool import tqdm headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'} def parse(url): print("im in function") response = requests.get(url[4], headers = headers) soup = BeautifulSoup(response.text, 'html.parser') all_skier_names = soup.find_all("div", class_ = "g-xs-10 g-sm-9 g-md-4 g-lg-4 justify-left bold align-xs-top") all_countries = soup.find_all("span", class_ = "country__name-short") discipline = url[0] season = url[1] competition = url[2] gender = url[3] out = [] for name, country in zip(all_skier_names , all_countries): skier_name = name.text.strip().title() country = country.text.strip() out.append([discipline, season, competition, gender, country, skier_name]) return out all_urls = [['Cross-Country', '2020', 'World Cup', 'M', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=M&nationcode='], ['Cross-Country', '2020', 'World Cup', 'L', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=L&nationcode='], ['Cross-Country', '2020', 'World Cup', 'M', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=M&nationcode='], ['Cross-Country', '2020', 'World Cup', 'L', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=L&nationcode=']] with Pool(processes=2) as pool, tqdm.tqdm(total=len(all_urls)) as pbar: all_data = [] print("im in pool") for data in pool.imap_unordered(parse, all_urls): print("im in data") all_data.extend(data) pbar.update() print(all_data)
运行代码时我唯一看到的是进度条,它始终为 0%:
0%| | 0/8 [00:00<?, ?it/s]
parse(url)我在函数中和代码末尾设置了几条打印语句for loop,但仍然只打印了“im in pool”。代码似乎根本没有进入函数,也没有进入代码末尾的 for 循环。
parse(url)
for loop
代码应该在 5-8 秒内执行,但我等了 10 分钟,什么也没发生。我也尝试过不使用进度条来执行此操作,但结果是一样的。
你知道问题是什么吗?是我使用的 Python 版本(Python 3.6 32 位)的问题还是某些库的版本的问题,我不知道该怎么办……
这个问题的根本原因通常与多处理模块在不同操作系统上的行为有关,特别是在 Windows 上。以下是一些可能的原因和解决方法:
在 Windows 上,多处理模块使用 spawn 方法来启动子进程。这意味着必须保护代码的入口点,因为当你在 if __name__ == "__main__": 下运行代码时,子进程会重新导入主模块。这在你的代码中没有被使用,因此可能导致问题。
spawn
if __name__ == "__main__":
库的版本:
requests
beautifulsoup4
下面是一个更新的代码示例,添加了 if __name__ == "__main__": 来保护主模块,并确保在 Windows 上的多处理可以正常工作:
from bs4 import BeautifulSoup import requests from multiprocessing import Pool import tqdm headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'} def parse(url): print("im in function") response = requests.get(url[4], headers=headers) soup = BeautifulSoup(response.text, 'html.parser') all_skier_names = soup.find_all("div", class_="g-xs-10 g-sm-9 g-md-4 g-lg-4 justify-left bold align-xs-top") all_countries = soup.find_all("span", class_="country__name-short") discipline = url[0] season = url[1] competition = url[2] gender = url[3] out = [] for name, country in zip(all_skier_names, all_countries): skier_name = name.text.strip().title() country = country.text.strip() out.append([discipline, season, competition, gender, country, skier_name]) return out if __name__ == "__main__": all_urls = [['Cross-Country', '2020', 'World Cup', 'M', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=M&nationcode='], ['Cross-Country', '2020', 'World Cup', 'L', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=L&nationcode='], ['Cross-Country', '2020', 'World Cup', 'M', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=M&nationcode='], ['Cross-Country', '2020', 'World Cup', 'L', 'https://www.fis-ski.com/DB/cross-country/cup-standings.html?sectorcode=CC&seasoncode=2020&cupcode=WC&disciplinecode=ALL&gendercode=L&nationcode=']] with Pool(processes=2) as pool, tqdm.tqdm(total=len(all_urls)) as pbar: all_data = [] print("im in pool") for data in pool.imap_unordered(parse, all_urls): print("im in data") all_data.extend(data) pbar.update() print(all_data)
.py
script.py
requests.get
parse
bash pip show requests beautifulsoup4 tqdm
通过这些更改,您的代码应该可以在 Windows 上正常运行。请尝试并告诉我是否解决了您的问题!