请教各位大佬Python采集如何遍历这种代码并拼出地址

<li\><span class\="file"\>favicon.ico</span\></li\>  

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>banner-ads</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>ad01.jpg</span\></li\>

<li\><span class\="img"\>ad02.jpg</span\></li\>

<li\><span class\="img"\>ad03.jpg</span\></li\>

<li\><span class\="img"\>ad04.jpg</span\></li\>

<li\><span class\="img"\>ad06.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>logos</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>logo-light.jpg</span\></li\>

<li\><span class\="img"\>logo.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>news</span\>

<ul style\="display: block;"\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>category</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>category1.jpg</span\></li\>

<li\><span class\="img"\>category2.jpg</span\></li\>

<li\><span class\="img"\>category3.jpg</span\></li\>

<li\><span class\="img"\>category4.jpg</span\></li\>

<li\><span class\="img"\>category5.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>fashion</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>image1.jpg</span\></li\>

<li\><span class\="img"\>image2.jpg</span\></li\>

<li\><span class\="img"\>image3.jpg</span\></li\>

<li\><span class\="img"\>image4.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>food</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>food01.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>health</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>image1.jpg</span\></li\>

<li\><span class\="img"\>image2.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>lifestyle</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>image1.jpg</span\></li\>

<li\><span class\="img"\>image2.jpg</span\></li\>

<li\><span class\="img"\>image3.jpg</span\></li\>

<li\><span class\="img"\>image4.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>news-details</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>large-image.jpg</span\></li\>

<li\><span class\="img"\>left-image.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>sports</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>sports02.jpg</span\></li\>

<li\><span class\="img"\>sports03.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>tech</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>image5.jpg</span\></li\>

<li\><span class\="img"\>tech02.jpg</span\></li\>

<li\><span class\="img"\>tech1.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>travel</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>image1.jpg</span\></li\>

<li\><span class\="img"\>image2.jpg</span\></li\>

<li\><span class\="img"\>image3.jpg</span\></li\>

</ul\>

</li\>

<li class\="expandable"\>

<div class\="hitarea expandable-hitarea"\></div\>

<span class\="folder"\>video</span\>

<ul style\="display: block;"\>

<li\><span class\="img"\>video1.jpg</span\></li\>

<li\><span class\="img"\>video2.jpg</span\></li\>

<li\><span class\="img"\>video3.jpg</span\></li\>

<li\><span class\="img"\>video4.jpg</span\></li\>

</ul\>

</li\>

<li\><span class\="img"\>author.jpg</span\></li\>

<li\><span class\="img"\>user1.jpg</span\></li\>

<li\><span class\="img"\>user2.jpg</span\></li\>

</ul\>

</li\>

<li\><span class\="img"\>controls.jpg</span\></li\>

请教各位大佬Python采集如何遍历这种代码并拼出地址
最终拼成如下
favicon.ico
banner-ads/ad01.jpg
news/category/category1.jpg
news/category/category2.jpg


回答:

把反斜杠去掉就是合法的xml,你用xml递归解析就可以解决这个问题。xpath并没有特别方便的递归方法,所以还是直接手工xml解析方便点

以上是 请教各位大佬Python采集如何遍历这种代码并拼出地址 的全部内容, 来源链接: utcz.com/p/938264.html

回到顶部