使用python和BeautifulSoup从网页检索链接 如何在Python中将字符串转换为整数 从Python列表中删除所有出现的值 使用python和BeautifulSoup从网页检索链接 方法1 这是使用BeautifulSoup中的SoupStrainer类的简短片段: import httplib2 from BeautifulSoup import BeautifulSoup, SoupStrainer http = httplib2.Http() status, response = http.request('http://www.nytimes.com') for link in BeautifulSoup(response, parse_only=SoupStrainer('a')): if link.has_attr('href'): print link['href'] 方法2 为了完整起见,BeautifulSoup 4版本也使用了服务器提供的编码: from bs4 import BeautifulSoup import urllib2 resp = urllib2.urlopen("http://www.gpsbasecamp.com/national-parks") soup = BeautifulSoup(resp, from_encoding=resp.info().getparam('charset')) for link in soup.find_all('a', href=True): print link['href'] 或Python 3版本: from bs4 import BeautifulSoup import urllib.request resp = urllib.request.urlopen("http://www.gpsbasecamp.com/national-parks") soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset')) for link in soup.find_all('a', href=True): print(link['href']) 以及使用该requests库的版本,其编写将适用于Python 2和3: from bs4 import BeautifulSoup from bs4.dammit import EncodingDetector import requests resp = requests.get("http://www.gpsbasecamp.com/national-parks") http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True) encoding = html_encoding or http_encoding soup = BeautifulSoup(resp.content, from_encoding=encoding) for link in soup.find_all('a', href=True): print(link['href']) 如何在Python中将字符串转换为整数 从Python列表中删除所有出现的值