一尘不染

如何在Selenium驱动程序中获取整个页面的innerHTML?

selenium

selenium用来单击所需的网页,然后使用解析网页Beautiful Soup

有人展示了如何在中获取元素的内部HTMLSeleniumWebDriver。有没有办法获取整个页面的HTML?谢谢

中的示例代码Python (基于上面的帖子,语言似乎没有太大关系):

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup


url = 'http://www.google.com'
driver = webdriver.Firefox()
driver.get(url)

the_html = driver---somehow----.get_attribute('innerHTML')
bs = BeautifulSoup(the_html, 'html.parser')

阅读 1139

收藏
2020-06-26

共1个答案

一尘不染

要获取整个页面的HTML:

from selenium import webdriver

driver = webdriver.Firefox()
driver.get("http://stackoverflow.com")

html = driver.page_source

要获取外部HTML(包括标记):

# HTML from `<html>`
html = driver.execute_script("return document.documentElement.outerHTML;")

# HTML from `<body>`
html = driver.execute_script("return document.body.outerHTML;")

# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].outerHTML;", element)

# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('outerHTML')

要获取内部HTML(不包括标签):

# HTML from `<html>`
html = driver.execute_script("return document.documentElement.innerHTML;")

# HTML from `<body>`
html = driver.execute_script("return document.body.innerHTML;")

# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].innerHTML;", element)

# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('innerHTML')
2020-06-26