为什么我没有获得领域的价值而不是领域本身?
所以我想使用BeautifulSoup和Python第一次做网页抓取。我试图刮掉页面是:http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172为什么我没有获得领域的价值而不是领域本身?
client = request('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172') page_html = client.read()
client.close()
page_soup = soup(page_html)
identification = page_soup.find('div', {'data-bind':'text: name'})
print(identification.text)
当我这样做,我只是得到一个空字符串。如果我打印出简单的标识变量,我得到:
<div class="col-xs-7" data-bind="text: name"></div>
This is the line of html that I am trying to get the value of, as you can see there is a value A LEBLANC there in the tag
回答:
你可以试试这个代码:
from selenium import webdriver driver=webdriver.Chrome()
browser=driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
find=driver.find_element_by_xpath('//*[@id="identificationCollapse"]/div/div/div/div[1]/div[1]/div[2]')
print(find.text)
输出:
A LEBLANC
回答:
有几种方法你可以达到同样的目标。但是,我在脚本中使用了选择器,这很容易理解,并且除非该网站的html结构发生重大变化,否则就不会有突破的机会。试试这个。
from selenium import webdriver from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
item_name = soup.select("[data-bind$='name']")[0].text
print(item_name)
结果:
A LEBLANC
顺便说一句,你启动的方式也将工作:
from selenium import webdriver from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('http://vesselregister.dnvgl.com/VesselRegister/vesseldetails.html?vesselid=34172')
soup = BeautifulSoup(driver.page_source,"lxml")
driver.quit()
item_name = soup.find('div', {'data-bind':'text: name'}).text
print(item_name)
以上是 为什么我没有获得领域的价值而不是领域本身? 的全部内容, 来源链接: utcz.com/qa/267273.html