嵌套选择器¶选择方法 ( >>> links = response.xpath('//a[contains(@href, "image")]')
>>> links.getall()
['<a href="image1.html">Name: My image 1 <br><img src="image1_thumb.jpg"></a>',
'<a href="image2.html">Name: My image 2 <br><img src="image2_thumb.jpg"></a>',
'<a href="image3.html">Name: My image 3 <br><img src="image3_thumb.jpg"></a>',
'<a href="image4.html">Name: My image 4 <br><img src="image4_thumb.jpg"></a>',
'<a href="image5.html">Name: My image 5 <br><img src="image5_thumb.jpg"></a>']
>>> for index, link in enumerate(links):
... href_xpath = link.xpath('@href').get()
... img_xpath = link.xpath('img/@src').get()
... print(f'Link number {index} points to url {href_xpath!r} and image {img_xpath!r}')
Link number 0 points to url 'image1.html' and image 'image1_thumb.jpg'
Link number 1 points to url 'image2.html' and image 'image2_thumb.jpg'
Link number 2 points to url 'image3.html' and image 'image3_thumb.jpg'
Link number 3 points to url 'image4.html' and image 'image4_thumb.jpg'
Link number 4 points to url 'image5.html' and image 'image5_thumb.jpg'
选择元素属性¶有几种方法可以获得属性的值。首先,可以使用XPath语法: >>> response.xpath("//a/@href").getall()
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
xpath语法有几个优点:它是标准的xpath特性,并且 scrapy还提供了对css选择器的扩展 ( >>> response.css('a::attr(href)').getall()
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
除此之外,还有 >>> [a.attrib['href'] for a in response.css('a')]
['image1.html', 'image2.html', 'image3.html', 'image4.html', 'image5.html']
此属性在SelectorList上也可用;它返回一个字典,其中包含第一个匹配元素的属性。当选择器预期给出单个结果时(例如,当按元素ID选择时,或在页面上选择唯一元素时),使用它非常方便: >>> response.css('base').attrib
{'href': 'http://example.com/'}
>>> response.css('base').attrib['href']
'http://example.com/'
>>> response.css('foo').attrib
{} |
Archiver|手机版|笨鸟自学网 ( 粤ICP备20019910号 )
GMT+8, 2024-12-4 15:46 , Processed in 0.022338 second(s), 17 queries .