笨鸟编程-零基础入门Pyhton教程

 找回密码
 立即注册

常用做法

发布者: 笨鸟自学网



在同一进程中运行多个spider

默认情况下,当您运行时,scrapy为每个进程运行一个spider scrapy crawl . 但是,Scrapy支持使用 internal API .

下面是一个同时运行多个蜘蛛的示例:

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

settings = get_project_settings()
process = CrawlerProcess(settings)
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start() # the script will block here until all crawling jobs are finished

使用相同的示例 CrawlerRunner :

import scrapy
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

configure_logging()
settings = get_project_settings()
runner = CrawlerRunner(settings)
runner.crawl(MySpider1)
runner.crawl(MySpider2)
d = runner.join()
d.addBoth(lambda _: reactor.stop())

reactor.run() # the script will block here until all crawling jobs are finished

同样的例子,但是通过链接延迟来按顺序运行spider:

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
from scrapy.utils.project import get_project_settings

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

configure_logging()
settings = get_project_settings()
runner = CrawlerRunner(settings)

@defer.inlineCallbacks
def crawl():
    yield runner.crawl(MySpider1)
    yield runner.crawl(MySpider2)
    reactor.stop()

crawl()
reactor.run() # the script will block here until the last crawl call is finished

参见

从脚本中运行Scrapy .


上一篇:蜘蛛合约下一篇:宽爬行

Archiver|手机版|笨鸟自学网 ( 粤ICP备20019910号 )

GMT+8, 2024-12-26 18:45 , Processed in 0.013279 second(s), 17 queries .

© 2001-2020

返回顶部