我正在抓取此网页中的用户名,该用户名在滚动后会加载用户
转到页面的网址:“ http://www.quora.com/Kevin- Rose/followers ”
我知道页面上的用户数量(在这种情况下,编号为43812)如何滚动页面,直到所有用户加载完毕?我在互联网上搜索了相同的代码,到处都可以找到几乎相同的代码行:
driver.execute_script(“ window.scrollTo(0,)”)
如何确定垂直位置以确保所有用户都被装载?还有其他选项可以实现相同的功能而无需实际滚动吗?
from selenium import webdriver from selenium.webdriver.common.keys import Keys from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC import time import urllib driver = webdriver.Firefox() driver.get('http://www.quora.com/') time.sleep(10) wait = WebDriverWait(driver, 10) form = driver.find_element_by_class_name('regular_login') time.sleep(10) #add explicit wait username = form.find_element_by_name('email') time.sleep(10) #add explicit wait username.send_keys('abc@gmail.com') time.sleep(30) #add explicit wait password = form.find_element_by_name('password') time.sleep(30) #add explicit wait password.send_keys('def') #add explicit wait password.send_keys(Keys.RETURN) time.sleep(30) #search = driver.find_element_by_name('search_input') search = wait.until(EC.presence_of_element_located((By.XPATH, "//form[@name='search_form']//input[@name='search_input']"))) search.clear() search.send_keys('Kevin Rose') search.send_keys(Keys.RETURN) link = wait.until(EC.presence_of_element_located((By.LINK_TEXT, "Kevin Rose"))) link.click() #Wait till the element is loaded (Asynchronusly loaded webpage) handle = driver.window_handles driver.switch_to.window(handle[1]) #switch to new window element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.PARTIAL_LINK_TEXT, "Followers"))) element.click()
由于在加载了最后一个关注者存储桶之后没有出现任何特殊情况,因此我将依赖于这样一个事实,即您知道用户拥有多少个关注者,并且您知道每次向下滚动时都加载了多少个关注者(我检查过-是18每卷)。因此,您可以计算将页面向下滚动多少次。
这是实现(我使用了只有53个关注者的其他用户来演示解决方案):
import time from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.support import expected_conditions as EC followers_per_page = 18 driver = webdriver.Chrome() # webdriver.Firefox() in your case driver.get("http://www.quora.com/Andrew-Delikat/followers") # get the followers count element = WebDriverWait(driver, 2).until(EC.presence_of_element_located((By.XPATH, '//li[contains(@class, "FollowersNavItem")]//span[@class="profile_count"]'))) followers_count = int(element.text.replace(',', '')) print followers_count # scroll down the page iteratively with a delay for _ in xrange(0, followers_count/followers_per_page + 1): driver.execute_script("window.scrollTo(0, 10000);") time.sleep(2)
另外,10000在跟随者数量众多的情况下,您可能需要根据循环变量增加此Y坐标值。
10000