首先在http://quotes.toscrape.com/在终端中: $ scrapy shell "http://quotes.toscrape.com/"
然后,返回web浏览器,右键单击 >>> response.xpath('/html/body/div/div[2]/div[1]/div[1]/span[1]/text()').getall()
['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”']
添加 如果我们检查 Inspector 我们将再次看到,在我们的 <div class="quote" itemscope="" itemtype="http://schema.org/CreativeWork">
<span class="text" itemprop="text">
“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”
</span>
<span>(...)</span>
<div class="tags">(...)</div>
</div>
有了这些知识,我们可以改进我们的xpath:我们只需选择 >>> response.xpath('//span[has-class("text")]/text()').getall()
['“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”',
'“It is our choices, Harry, that show what we truly are, far more than our abilities.”',
'“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”',
...]
通过一个简单、更聪明的xpath,我们能够从页面中提取所有的引号。我们可以在第一个xpath上构建一个循环,以增加最后一个xpath的数量。 这个 Inspector 还有很多其他有用的功能,比如在源代码中搜索或者直接滚动到您选择的元素。让我们演示一个用例: 说你想找到 请注意,搜索栏也可用于搜索和测试CSS选择器。例如,您可以搜索 网络工具¶在抓取过程中,您可能会遇到动态网页,其中页面的某些部分是通过多个请求动态加载的。虽然这很棘手,但是 Network-tool in the Developer Tools greatly facilitates this task. To demonstrate the Network-tool, let's take a look at the page quotes.toscrape.com/scroll . 页面与基本页面非常相似 quotes.toscrape.com -第页,但不是上面提到的 $ scrapy shell "quotes.toscrape.com/scroll"
(...)
>>> view(response)
浏览器窗口应该和网页一起打开,但有一个关键的区别:我们看到的不是引用,而是一个带单词的绿色条。 这个 如果你点击 |
Archiver|手机版|笨鸟自学网 ( 粤ICP备20019910号 )
GMT+8, 2024-12-4 15:48 , Processed in 0.015622 second(s), 17 queries .