我正尝试在此网站上搜索company names, code, industry, sector, mkt cap, etcselenium表中的清单。我是新手,并编写了以下代码:
company names, code, industry, sector, mkt cap, etc
path_to_chromedriver = r'C:\Documents\chromedriver' browser = webdriver.Chrome(executable_path=path_to_chromedriver) url = r'http://sgx.com/wps/portal/sgxweb/home/company_disclosure/stockfacts' browser.get(url) time.sleep(15) output = browser.page_source print(output)
但是,我可以获取以下标签,但不能获取其中的数据。
<div class="table-wrapper results-display"> <table> <thead> <tr></tr> </thead> <tbody></tbody> </table> </div> <div class="pager results-display"></div>
我以前也尝试过BS4进行刮擦,但失败了。任何帮助深表感谢。
该 结果是在一个iframe -切换到它,然后得到.page_source:
.page_source
iframe = driver.find_element_by_css_selector("#mainContent iframe") driver.switch_to.frame(iframe)
我还要添加一个等待表加载的方法:
from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC wait = WebDriverWait(driver, 10) # locate and switch to the iframe iframe = driver.find_element_by_css_selector("#mainContent iframe") driver.switch_to.frame(iframe) # wait for the table to load wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.companyName'))) print(driver.page_source)