如何在Selenium驱动程序中获取整个页面的innerHTML?
我selenium
用来单击所需的网页,然后使用解析网页Beautiful Soup
。
有人展示了如何在中获取元素的内部HTMLSeleniumWebDriver
。有没有办法获取整个页面的HTML?谢谢
中的示例代码Python
(基于上面的帖子,语言似乎没有太大关系):
from selenium import webdriverfrom selenium.webdriver.support.ui import Select
from bs4 import BeautifulSoup
url = 'http://www.google.com'
driver = webdriver.Firefox()
driver.get(url)
the_html = driver---somehow----.get_attribute('innerHTML')
bs = BeautifulSoup(the_html, 'html.parser')
回答:
要获取整个页面的HTML:
from selenium import webdriverdriver = webdriver.Firefox()
driver.get("http://stackoverflow.com")
html = driver.page_source
要获取外部HTML(包括标记):
# HTML from `<html>`html = driver.execute_script("return document.documentElement.outerHTML;")
# HTML from `<body>`
html = driver.execute_script("return document.body.outerHTML;")
# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].outerHTML;", element)
# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('outerHTML')
要获取内部HTML(不包括标签):
# HTML from `<html>`html = driver.execute_script("return document.documentElement.innerHTML;")
# HTML from `<body>`
html = driver.execute_script("return document.body.innerHTML;")
# HTML from element with some JavaScript
element = driver.find_element_by_css_selector("#hireme")
html = driver.execute_script("return arguments[0].innerHTML;", element)
# HTML from element with `get_attribute`
element = driver.find_element_by_css_selector("#hireme")
html = element.get_attribute('innerHTML')
以上是 如何在Selenium驱动程序中获取整个页面的innerHTML? 的全部内容, 来源链接: utcz.com/qa/432676.html