从spiders调用shell来检查响应¶有时,您希望检查在您的蜘蛛的某个点上正在处理的响应,如果只是检查您期望的响应是否到达那里的话。 这可以通过使用 下面是一个例子,说明如何从您的蜘蛛中命名它: import scrapy
class MySpider(scrapy.Spider):
name = "myspider"
start_urls = [
"http://example.com",
"http://example.org",
"http://example.net",
]
def parse(self, response):
# We want to inspect one specific response.
if ".org" in response.url:
from scrapy.shell import inspect_response
inspect_response(response, self)
# Rest of parsing code.
当你运行蜘蛛时,你会得到类似的东西: 2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.com> (referer: None)
2014-01-23 17:48:31-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.org> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x1e16b50>
...
>>> response.url
'http://example.org'
然后,可以检查提取代码是否正常工作: >>> response.xpath('//h1[@class="fn"]')
[]
不,它不是。所以您可以在web浏览器中打开响应,看看它是否是您期望的响应: >>> view(response)
True
最后,单击ctrl-d(或在Windows中单击ctrl-z)退出shell并继续爬网: >>> ^D
2014-01-23 17:50:03-0400 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://example.net> (referer: None)
...
请注意,您不能使用 |
Archiver|手机版|笨鸟自学网 ( 粤ICP备20019910号 )
GMT+8, 2024-11-21 21:11 , Processed in 0.022777 second(s), 17 queries .