I have a case where I should visit multiple links and extract information from them. The problem is that when I use “colly.Visit(URL)” I am getting increased visiting. Example:
package main
import (
"fmt"
"github.com/gocolly/colly"
)
func main() {
CATETORIES := []string{
"cate1",
"cate2",
"cate3",
}
c := colly.NewCollector()
for _, cate := range CATETORIES {
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting categories", r.URL)
})
c.Visit(cate)
}
}
That will print:
Visiting categories http://cate1
Visiting categories http://cate2
Visiting categories http://cate2
Visiting categories http://cate3
Visiting categories http://cate3
Visiting categories http://cate3
I tried to initialize colly after every iteration and that worked well - then the order was: Visiting categories http://cate1, Visiting categories http://cate2, Visiting categories http://cate3 BUT doing it this way I am loosing my login session.. Any suggestions?
The issue you are facing is because you are adding the OnRequest
callback inside the loop, and each time you add a new callback, it accumulates with the previous callbacks. That’s why you see increased visiting messages.
To fix this issue, you can move the OnRequest
callback outside of the loop and use a closure to capture the value of cate
for each iteration. This way, you will have a single callback for each request, and it will correctly display the URL for each category.
Here’s an updated version of your code:
package main
import (
"fmt"
"github.com/gocolly/colly"
)
func main() {
CATEGORIES := []string{
"cate1",
"cate2",
"cate3",
}
c := colly.NewCollector()
// Add the OnRequest callback outside of the loop
c.OnRequest(func(r *colly.Request) {
fmt.Println("Visiting category", r.URL)
})
for _, cate := range CATEGORIES {
// Use a closure to capture the value of cate for each iteration
func(cate string) {
c.Visit(cate)
}(cate)
}
}
By using a closure with func(cate string) {...}
, you ensure that the cate
value is captured correctly for each iteration, and the OnRequest
callback is only added once. Now, you should see the correct visiting messages for each category without duplication.