如何使用Python从HTML获得href链接？

Z时代
2024-01-10
分类：问答

import urllib2

website = "WEBSITE"
openwebsite = urllib2.urlopen(website)
html = getwebsite.read()
print html

到目前为止，一切都很好。

但是我只希望纯文本HTML中的href链接。我怎么解决这个问题？

回答：

尝试使用Beautifulsoup：

from BeautifulSoup import BeautifulSoup
import urllib2
import re
html_page = urllib2.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
    print link.get('href')

如果您只想要以开头的链接http://，则应使用：

soup.findAll('a', attrs={'href': re.compile("^http://")})

在带有BS4的Python 3中，它应该是：

from bs4 import BeautifulSoup
import urllib.request
html_page = urllib.request.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page, "html.parser")
for link in soup.findAll('a'):
    print(link.get('href'))

以上是如何使用Python从HTML获得href链接？的全部内容，来源链接： utcz.com/qa/421842.html

如何使用Python从HTML获得href链接？

回答：

其他人也看了：