小能豆

Go Colly - visiting a URL in a for loop

go

I have a case where I should visit multiple links and extract information from them. The problem is that when I use “colly.Visit(URL)” I am getting increased visiting. Example:

package main

import (
    "fmt"

    "github.com/gocolly/colly"
)

func main() {

    CATETORIES := []string{
        "cate1",
        "cate2",
        "cate3",
    }

    c := colly.NewCollector()

    for _, cate := range CATETORIES {

        c.OnRequest(func(r *colly.Request) {
            fmt.Println("Visiting categories", r.URL)
        })

        c.Visit(cate)
    }
}

That will print:

Visiting categories http://cate1  
Visiting categories http://cate2
Visiting categories http://cate2
Visiting categories http://cate3
Visiting categories http://cate3
Visiting categories http://cate3

I tried to initialize colly after every iteration and that worked well - then the order was: Visiting categories http://cate1, Visiting categories http://cate2, Visiting categories http://cate3 BUT doing it this way I am loosing my login session.. Any suggestions?


阅读 72

收藏
2023-12-19

共1个答案

小能豆

The issue you are facing is because you are adding the OnRequest callback inside the loop, and each time you add a new callback, it accumulates with the previous callbacks. That’s why you see increased visiting messages.

To fix this issue, you can move the OnRequest callback outside of the loop and use a closure to capture the value of cate for each iteration. This way, you will have a single callback for each request, and it will correctly display the URL for each category.

Here’s an updated version of your code:

package main

import (
    "fmt"
    "github.com/gocolly/colly"
)

func main() {
    CATEGORIES := []string{
        "cate1",
        "cate2",
        "cate3",
    }

    c := colly.NewCollector()

    // Add the OnRequest callback outside of the loop
    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting category", r.URL)
    })

    for _, cate := range CATEGORIES {
        // Use a closure to capture the value of cate for each iteration
        func(cate string) {
            c.Visit(cate)
        }(cate)
    }
}

By using a closure with func(cate string) {...}, you ensure that the cate value is captured correctly for each iteration, and the OnRequest callback is only added once. Now, you should see the correct visiting messages for each category without duplication.

2023-12-19