Scrapy从href中获取href

一尘不染

scrapy

我开始将Scrapy用于一个小型项目，但无法提取链接。每次找到该类时，我只会得到“ []”而不是URL。我是否缺少明显的东西？

sel = Selector(response)
for entry in sel.xpath("//div[@class='recipe-description']"):
    print entry.xpath('href').extract()

来自网站的示例：

<div class="recipe-description">
    <a href="http://www.url.com/">
        <h2 class="rows-2"><span>SomeText</span></h2>
    </a>
</div>

阅读 1586

2020-04-10

共1个答案

一尘不染

你的xpath查询错误

for entry in sel.xpath("//div[@class='recipe-description']"):

在这一行中，你实际上是在对没有任何Href属性的div进行迭代

为了使其正确，你应该在中选择achor元素div：

for entry in sel.xpath("//div[@class='recipe-description']/a"):
    print entry.xpath('href').extract()

最好的解决方案是直接href在for循环中提取属性

for href in sel.xpath("//div[@class='recipe-description']/a/@href").extract():
    print href

为了简单起见，你还可以使用CSS选择器

for href in sel.css("div.recipe-description a::attr(href)").extract():
    print href

2020-04-10