这是我关于堆栈溢出的第一个问题。最近,我想使用linked-in-scraper,,因此我下载并指示“ scrapy crawllinkedin.com”并获得以下错误消息。供你参考,我使用anaconda 2.3.0和python 2.7.11。在执行程序之前,所有相关软件包(包括scrapy和6个软件包)都会通过pip更新。
Traceback (most recent call last): File "/Users/byeongsuyu/anaconda/bin/scrapy", line 11, in <module> sys.exit(execute()) File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/cmdline.py", line 108, in execute settings = get_project_settings() File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/utils/project.py", line 60, in get_project_settings settings.setmodule(settings_module_path, priority='project') File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 285, in setmodule self.set(key, getattr(module, key), priority) File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 260, in set self.attributes[name].set(value, priority) File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 55, in set value = BaseSettings(value, priority=priority) File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 91, in __init__ self.update(values, priority) File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/scrapy/settings/__init__.py", line 317, in update for name, value in six.iteritems(values): File "/Users/byeongsuyu/anaconda/lib/python2.7/site-packages/six.py", line 599, in iteritems return d.iteritems(**kw) AttributeError: 'list' object has no attribute 'iteritems'
我知道此错误源于d,不是字典类型而是列表类型。而且由于错误来自于scrapy上的代码,所以也许是scrapy或六个包装上的问题。我该如何解决此错误?
编辑:这是来自scrapy.cfg的代码
# Automatically created by: scrapy start project # # For more information about the [deploy] section see: # http://doc.scrapy.org/topics/scrapyd.html [settings] default = linkedIn.settings [deploy] #url = http://localhost:6800/ project = linkedIn
这是由链接的scraper’s settings:引起的:
ITEM_PIPELINES = ['linkedIn.pipelines.LinkedinPipeline']
但是,根据doc,ITEM_PIPELINES这应该是一个命令:
要激活Item Pipeline组件,必须将其类添加到ITEM_PIPELINES设置中,如以下示例所示:
ITEM_PIPELINES = { 'myproject.pipelines.PricePipeline': 300, 'myproject.pipelines.JsonWriterPipeline': 800, }
你在此设置中分配给类的整数值确定它们运行的顺序:项目从值较低的类到值较高的类进行检查。通常将这些数字定义在0-1000范围内。
根据这个问题,它曾经是一个列表,解释了为什么此刮板使用列表。因此,你将不得不要求刮板的开发人员更新其代码,或者自行设置ITEM_PIPELINES。