我必须从该URL依次单击每个搜索结果:
搜索准则
我首先从显示的文本中提取结果总数,以便可以设置迭代的上限
upperlimit=driver.find_element_by_id("total_results") number = int(upperlimit.text.split(' ')[0])
然后在范围(1,number)中像i一样捍卫循环:
但是,浏览完首页上的前10个结果后,列表索引超出范围(可能是因为没有更多可单击的链接)。我需要单击“下一步”以获取下10个结果,依此类推,直到完成所有搜索结果。我该怎么做呢?
任何帮助,将不胜感激!
问题是具有id的element的值total_results在页面加载后发生变化,首先包含117,然后变为44。
total_results
117
44
相反,这是一种更可靠的方法。它逐页处理,直到没有剩余的页面了:
from selenium import webdriver from selenium.common.exceptions import NoSuchElementException driver = webdriver.Firefox() url = 'http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true#/search/?searchText=bevacizumab&mode=&staticTitle=false&SEARCHTYPE_all2=true&SEARCHTYPE_all1=&SEARCHTYPE=GUIDANCE&TOPICLVL0_all2=true&TOPICLVL0_all1=&HIDEFILTER=TOPICLVL1&HIDEFILTER=TOPICLVL2&TREATMENTS_all2=true&TREATMENTS_all1=&GUIDANCETYPE_all2=true&GUIDANCETYPE_all1=&STATUS_all2=true&STATUS_all1=&HIDEFILTER=EGAPREFERENCE&HIDEFILTER=TOPICLVL3&DATEFILTER_ALL=ALL&DATEFILTER_PREV=ALL&custom_date_from=&custom_date_to=11-06-2014&PAGINATIONURL=%2FSearch.do%3FsearchText%40%40bevacizumab%26newsearch%40%40true%26page%40%40&SORTORDER=BESTMATCH' driver.get(url) page_number = 1 while True: try: link = driver.find_element_by_link_text(str(page_number)) except NoSuchElementException: break link.click() print driver.current_url page_number += 1
基本上,这里的想法是获取下一页链接,直到没有此类链接(NoSuchElementException将被抛出)。请注意,它适用于任意数量的页面和结果。
NoSuchElementException
它打印:
http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=1 http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=2#showfilter http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=3#showfilter http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=4#showfilter http://www.nice.org.uk/Search.do?searchText=bevacizumab&newsearch=true&page=5#showfilter