xpath过滤元素怎么写

Z时代
2024-02-08
分类：IT

xpath过滤元素怎么写

python新手，问题比较初级，各位大佬轻喷。
需要爬点数据，有个xpath问题请教下各位大佬，如下面html代码所示，
有<span class="media-caption__text"></span>标签就获取它内部文本，没有就获取<figcaption></figcaption>内部的文本，但是必须要过滤掉<span class="off-screen"></span>

html代码如下所示：

<figcaption class="media-caption">
    <span class="off-screen">Image caption</span> 
    <span class="media-caption__text"> &#32445;&#32422;&#24066;&#26159;&#32654;&#22269;&#30123;&#24773;&#30340;&#8220;&#38663;&#20013;&#8221;&#12290;    </span></figcaption>

或者

<figcaption class="media-with-caption__caption">
    <span class="off-screen"></span>     
    &#22833;&#19994;&#20013;&#30340;&#32654;&#22269;&#38738;&#24180;&#65306;&#27882;&#27700;&#12289;&#24656;&#24807;&#19982;&#19981;&#23433;</figcaption>

回答：

为什么不用代码逻辑呢。。。
用xpath的话感觉很丑

//figcaption/span[@class="media-caption__text"][count(//figcaption/span[@class="media-caption__text"]) > 0]/text()[normalize-space()]|//figcaption[count(//figcaption/span[@class="media-caption__text"]) = 0]/text()[normalize-space()]

回答：

from lxml import etree
text = '''
<figcaption class="media-caption">
<span class="off-screen">Image caption</span>
<span class="media-caption__text"> &#32445;&#32422;&#24066;&#26159;&#32654;&#22269;&#30123;&#24773;&#30340;&#8220;&#38663;&#20013;&#8221;&#12290; </span>
</figcaption>
<figcaption class="media-with-caption__caption">
<span class="off-screen"></span>
&#22833;&#19994;&#20013;&#30340;&#32654;&#22269;&#38738;&#24180;&#65306;&#27882;&#27700;&#12289;&#24656;&#24807;&#19982;&#19981;&#23433;
</figcaption>
'''
html = etree.HTML(text)
result = html.xpath('//figcaption//text()[normalize-space()]')print(result)

以上是 xpath过滤元素怎么写的全部内容，来源链接： utcz.com/p/937808.html

xpath过滤元素怎么写

回答：

回答：

其他人也看了：