我想通过脚本而不是 scrap crawl
scrap crawl
我找到此页面
http://doc.scrapy.org/en/latest/topics/practices.html
但实际上并没有说明将脚本放在何处。
有什么帮助吗?
只需检查官方文档即可。我会进行一些更改,以便你可以控制Spider仅在执行此操作时运行,python myscript.py而不是每次从其导入时都运行。只需添加一个if __name__ == "__main__":
python myscript.py
if __name__ == "__main__"
import scrapy from scrapy.crawler import CrawlerProcess class MySpider(scrapy.Spider): # Your spider definition pass if __name__ == "__main__": process = CrawlerProcess({ 'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)' }) process.crawl(MySpider) process.start() # the script will block here until the crawling is finished
现在将文件另存为,myscript.py然后运行“ python myscript.py”。
myscript.py