我已经在vba中结合selenium编写了一个脚本,以解析网页中所有可用的公司名称。该网页已启用了惰性加载方法,因此每个滚动中仅可见20个链接。如果滚动2次,则可见的链接数为40,依此类推。该网页上有1000个链接。我下面的脚本可以到达该页面的底部,处理所有滚动并获取该网页中所有可用的名称。
但是,必须在每次滚动之后等待一定时间,以便该网页更新内容。这是我使用过的地方,hardcoded delay但硬编码过程却非常不一致,有时它会使浏览器在整个操作完成之前退出。
hardcoded delay
如何修改此部分.Wait 6000以Explicit Wait代替Hardcoded Wait。
.Wait 6000
Explicit Wait
Hardcoded Wait
到目前为止,这是我写的:
Sub Getlinks() Dim driver As New ChromeDriver, prevlen&, curlen& Dim posts As Object, post As Object With driver .get "http://fortune.com/fortune500/list/" prevlen = .FindElementsByClass("company-title").Count Do prevlen = curlen .ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);") .Wait 6000 ''I like to kick out this hardcoded delay and use explicit wait in place Set posts = .FindElementsByClass("company-title") curlen = posts.Count If prevlen = curlen Then Exit Do Loop For Each post In posts R = R + 1: Cells(R, 1) = post.Text Next post End With End Sub
定义 超时 (将允许经过的指定时间段)以摆脱硬编码的延迟。超时需要进行硬编码。
此代码与原始代码之间的区别是:
码:
Sub Getlinks() Dim driver As New ChromeDriver, prevlen&, curlen& Dim posts As Object, post As Object Dim timeout As Integer, startTime As Double timeout = 10 ' set the timeout to 10 seconds With driver .get "http://fortune.com/fortune500/list/" prevlen = .FindElementsByClass("company-title").Count startTime = Timer ' set the initial starting time Do .ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);") Set posts = .FindElementsByClass("company-title") curlen = posts.Count If curlen > prevlen Then startTime = Timer ' reset start time if new elements found prevlen = curlen ' set new prevlen End If Loop While Round(Timer - startTime, 2) <= timeout ' check if timeout is reached For Each post In posts R = R + 1: Cells(R, 1) = post.Text Next post End With End Sub