一尘不染

无法摆脱脚本中的硬编码延迟

selenium

我已经在vba中结合selenium编写了一个脚本,以解析网页中所有可用的公司名称。该网页已启用了惰性加载方法,因此每个滚动中仅可见20个链接。如果滚动2次,则可见的链接数为40,依此类推。该网页上有1000个链接。我下面的脚本可以到达该页面的底部,处理所有滚动并获取该网页中所有可用的名称。

但是,必须在每次滚动之后等待一定时间,以便该网页更新内容。这是我使用过的地方,hardcoded delay但硬编码过程却非常不一致,有时它会使浏览器在整个操作完成之前退出。

如何修改此部分.Wait 6000Explicit Wait代替Hardcoded Wait

到目前为止,这是我写的:

Sub Getlinks()
    Dim driver As New ChromeDriver, prevlen&, curlen&
    Dim posts As Object, post As Object

    With driver
        .get "http://fortune.com/fortune500/list/"
        prevlen = .FindElementsByClass("company-title").Count

        Do
            prevlen = curlen
            .ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")

            .Wait 6000  ''I like to kick out this hardcoded delay and use explicit wait in place

            Set posts = .FindElementsByClass("company-title")
            curlen = posts.Count
            If prevlen = curlen Then Exit Do
        Loop

        For Each post In posts
            R = R + 1: Cells(R, 1) = post.Text
        Next post
    End With
End Sub

阅读 330

收藏
2020-06-26

共1个答案

一尘不染

定义 超时 (将允许经过的指定时间段)以摆脱硬编码的延迟。超时需要进行硬编码。

此代码与原始代码之间的区别是:

  • 循环本身一遍又一遍地运行(每次迭代不等待6秒),并检查是否有新内容,直到找到新内容或达到超时为止。
  • 如果延迟加载花费的时间比预期的要长,例如在将数字21加载到50时,循环将“等待”并尝试获取超时中定义的最大时间的新内容。
  • 缺点:在加载所有内容的最后一步,循环将花费与设置超时相同的秒数。

码:

Sub Getlinks()
    Dim driver As New ChromeDriver, prevlen&, curlen&
    Dim posts As Object, post As Object
    Dim timeout As Integer, startTime As Double

    timeout = 10 ' set the timeout to 10 seconds

    With driver
        .get "http://fortune.com/fortune500/list/"
        prevlen = .FindElementsByClass("company-title").Count

        startTime = Timer ' set the initial starting time

        Do
            .ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")
            Set posts = .FindElementsByClass("company-title")
            curlen = posts.Count
            If curlen > prevlen Then
                startTime = Timer ' reset start time if new elements found
                prevlen = curlen ' set new prevlen
            End If
        Loop While Round(Timer - startTime, 2) <= timeout ' check if timeout is reached

        For Each post In posts
            R = R + 1: Cells(R, 1) = post.Text
        Next post
    End With
End Sub
2020-06-26