一尘不染

无法以表格形式从日期内容中获取日期

selenium

我已经用python与selenium结合编写了一个脚本,以解析网页中表格中的一些可用日期。该表位于标题下NPL Victoria Betting Odds。表格数据位于id内tournamentTable。你可以看到三个日期还有10 Aug 201811 Aug 201812 Aug 2018。我希望根据我下面的预期输出来解析和排列它们。

网页连结

到目前为止,这是我的尝试:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

link = "find the link above"

def get_content(driver,url):
    driver.get(url)
    for items in wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr"))):
        try:
            idate = items.find_element_by_css_selector("th span[class^='datet']").text
        except Exception: idate = ""
        try:
            itime = items.find_element_by_css_selector("td.table-time").text
        except Exception: itime = ""

        print(f'{idate}--{itime}')

if __name__ == '__main__':
    driver = webdriver.Chrome()
    wait = WebDriverWait(driver,10)
    try:
        get_content(driver,link)
    finally:
        driver.quit()

目前,我的输出如下:

--
10 Aug 2018--
--
--09:30
--10:15
11 Aug 2018--
--
--05:00
--05:00
--09:00
12 Aug 2018--
--
--06:00
--06:00

我的预期输出:

10 Aug 2018--09:30
10 Aug 2018--10:15
11 Aug 2018--05:00
11 Aug 2018--05:00
11 Aug 2018--09:00
12 Aug 2018--06:00
12 Aug 2018--06:00

阅读 245

收藏
2020-06-26

共1个答案

一尘不染

尝试使用以下代码:

def get_content(driver,url):
    driver.get(url)
    dates = len(wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,"#tournamentTable tr.center.nob-border"))))
    for d in range(dates):
        item = driver.find_elements_by_css_selector("#tournamentTable tr.center.nob-border")[d]
        try:
            idate = item.find_element_by_css_selector("th span[class^='datet']").text
        except Exception: idate = ""
        for time_td in item.find_elements_by_xpath(".//following::td[contains(@class, 'table-time') and not((preceding::tr[@class='center nob-border'])[%d])]" % (d + 2)):
            try:
                itime = time_td.text
            except Exception: itime = ""
            print(f'{idate}--{itime}')
2020-06-26