o****g 发帖数: 174 | 1 用的是这个wheel, 要先加载一个package scrapy.
https://github.com/LKI/wescraper
有如下错误:好像是没有得到cookie? 为什么?
[scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: scrapybot)
2018-01-23 17:51:22 [scrapy.utils.log] INFO: Versions: lxml 4.1.1.0, libxml2
2.9.7, cssselect 1.0.3, parsel 1.3.1, w3lib 1.18.0, Twisted 17.9.0, Python
2.7.13 |Anaconda 4.4.0 (64-bit)| (default, Dec 20 2016, 23:09:15) - [GCC 4.4
.7 20120313 (Red Hat 4.4.7-1)], pyOpenSSL 17.0.0 (OpenSSL 1.0.2l 25 May
2017), cryptography 1.8.1, Platform Linux-3.13.0-92-generic-x86_64-with-
debian-jessie-sid
2018-01-23 17:51:22 [scrapy.crawler] INFO: Overridden settings: {'DUPEFILTER
_CLASS': u'scrapy.dupefilter.BaseDupeFilter'}
2018-01-23 17:51:22 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.corestats.CoreStats']
2018-01-23 17:51:22 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2018-01-23 17:51:22 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2018-01-23 17:51:22 [scrapy.middleware] INFO: Enabled item pipelines:
[u'__main__.WeScraper']
2018-01-23 17:51:22 [scrapy.core.engine] INFO: Spider opened
2018-01-23 17:51:22 [py.warnings] WARNING: /home/ubuntu/anaconda2/lib/
python2.7/importlib/__init__.py:37: ScrapyDeprecationWarning: Module `scrapy
.dupefilter` is deprecated, use `scrapy.dupefilters` instead
__import__(name)
2018-01-23 17:51:22 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0
pages/min), scraped 0 items (at 0 items/min)
2018-01-23 17:51:22 [scrapy.extensions.telnet] DEBUG: Telnet console
listening on 127.0.0.1:6023
2018-01-23 17:51:22 [scrapy.downloadermiddlewares.redirect] DEBUG:
Redirecting (302) to http://weixin.sogou.com/> from http://weixin.sogou.com/weixin?type=2&sourceid=inttime_day&tsn=1&query=miawu>
2018-01-23 17:51:22 [scrapy.downloadermiddlewares.redirect] DEBUG:
Redirecting (302) to http://weixin.sogou.com/> from http://weixin.sogou.com/weixin?type=2&sourceid=inttime_day&tsn=1&query=liriansu>
2018-01-23 17:51:23 [scrapy.core.engine] DEBUG: Crawled (200) http://weixin.sogou.com/> (referer: None)
2018-01-23 17:51:23 [scrapy.core.engine] DEBUG: Crawled (200) http://weixin.sogou.com/> (referer: None)
2018-01-23 17:51:23 [sogou.com/] DEBUG: Current cookie: {}
2018-01-23 17:51:23 [scrapy.core.scraper] ERROR: Spider error processing <
GET http://weixin.sogou.com/> (referer: None)
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/utils/
defer.py", line 102, in iter_errback
yield next(it)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/offsite.py", line 30, in process_spider_output
for x in result:
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "/home/ubuntu/wescraper/wescraper/wespider.py", line 97, in parse_
keyword
self.cookie_pool.set_return_header(response.headers.getlist('Set-Cookie'
), current_cookie)
File "/home/ubuntu/wescraper/wescraper/cookie.py", line 68, in set_return_
header
self.dump()
File "/home/ubuntu/wescraper/wescraper/cookie.py", line 27, in dump
lines = [cookie['SNUID'], cookie['SUID'], cookie['SUV']]
KeyError: u'SNUID'
2018-01-23 17:51:23 [sogou.com/] DEBUG: Current cookie: {}
2018-01-23 17:51:23 [scrapy.core.scraper] ERROR: Spider error processing <
GET http://weixin.sogou.com/> (referer: None)
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/utils/
defer.py", line 102, in iter_errback
yield next(it)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/offsite.py", line 30, in process_spider_output
for x in result:
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/scrapy/
spidermiddlewares/depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "/home/ubuntu/wescraper/wescraper/wespider.py", line 97, in parse_
keyword
self.cookie_pool.set_return_header(response.headers.getlist('Set-Cookie'
), current_cookie)
File "/home/ubuntu/wescraper/wescraper/cookie.py", line 68, in set_return_
header
self.dump()
File "/home/ubuntu/wescraper/wescraper/cookie.py", line 27, in dump
lines = [cookie['SNUID'], cookie['SUID'], cookie['SUV']]
KeyError: u'SNUID'
2018-01-23 17:51:23 [scrapy.core.engine] INFO: Closing spider (finished)
2018-01-23 17:51:23 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1209,
'downloader/request_count': 4,
'downloader/request_method_count/GET': 4,
'downloader/response_bytes': 49660,
'downloader/response_count': 4,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/302': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 1, 23, 17, 51, 23, 392293),
'log_count/DEBUG': 7,
'log_count/ERROR': 2,
'log_count/INFO': 7,
'log_count/WARNING': 1,
'memusage/max': 40009728,
'memusage/startup': 40009728,
'response_received_count': 2,
'scheduler/dequeued': 4,
'scheduler/dequeued/memory': 4,
'scheduler/enqueued': 4,
'scheduler/enqueued/memory': 4,
'spider_exceptions/KeyError': 2,
'start_time': datetime.datetime(2018, 1, 23, 17, 51, 22, 293700)} |
|