I have a case where I should visit multiple links and extract information from them. The problem is that when I use “colly.Visit(URL)” I am getting increased visiting. Example:
package main import ( "fmt" "github.com/gocolly/colly" ) func main() { CATETORIES := []string{ "cate1", "cate2", "cate3", } c := colly.NewCollector() for _, cate := range CATETORIES { c.OnRequest(func(r *colly.Request) { fmt.Println("Visiting categories", r.URL) }) c.Visit(cate) } }
That will print:
Visiting categories http://cate1 Visiting categories http://cate2 Visiting categories http://cate2 Visiting categories http://cate3 Visiting categories http://cate3 Visiting categories http://cate3
I tried to initialize colly after every iteration and that worked well - then the order was: Visiting categories http://cate1, Visiting categories http://cate2, Visiting categories http://cate3 BUT doing it this way I am loosing my login session.. Any suggestions?
The issue you are facing is because you are adding the OnRequest callback inside the loop, and each time you add a new callback, it accumulates with the previous callbacks. That’s why you see increased visiting messages.
OnRequest
To fix this issue, you can move the OnRequest callback outside of the loop and use a closure to capture the value of cate for each iteration. This way, you will have a single callback for each request, and it will correctly display the URL for each category.
cate
Here’s an updated version of your code:
package main import ( "fmt" "github.com/gocolly/colly" ) func main() { CATEGORIES := []string{ "cate1", "cate2", "cate3", } c := colly.NewCollector() // Add the OnRequest callback outside of the loop c.OnRequest(func(r *colly.Request) { fmt.Println("Visiting category", r.URL) }) for _, cate := range CATEGORIES { // Use a closure to capture the value of cate for each iteration func(cate string) { c.Visit(cate) }(cate) } }
By using a closure with func(cate string) {...}, you ensure that the cate value is captured correctly for each iteration, and the OnRequest callback is only added once. Now, you should see the correct visiting messages for each category without duplication.
func(cate string) {...}