未检测到html_node的Rvest节点

Z时代
2024-01-10
分类：问答

我不明白为什么我不能在Rvest的某些网站上使用选择器。未检测到html_node的Rvest节点

实施例：

url <- read_html("http://www.cbc.ca/news/politics") 
headlines <- url %>% 
html_nodes(".headline") %>% 
html_text()

又如：

library(RSelenium) 
rD <- rsDriver(verbose = FALSE) 
rD 
remDr <- rD$client 
url <- "http://www.cbc.ca/news/politics" 
remDr$navigate(url) 
remDr$getTitle() 
remDr$getCurrentUrl() 
webElem <- remDr$findElement(using = "class", value = 'headline') 
webElem$getElementAttribute("class") 
remDr$close() 
rD$server$stop()

它应该足够简单。当我看结构时，标题是在课堂标题下。上面有类卡片内容，卡片内容顶部，但没有组合的CSS选择器，也没有xpath似乎工作。

回答：

CSS选择器可能无法在rvest由于有一些问题（至少在Debian）的selectr包，看到这个更多信息工作： https://github.com/sjp/selectr/issues/7

使用SelectorGadget和Chrome开发人员工具，我用下面的XPath从网页中找到并识别“头条新闻”。如何找到正确的XPath的更多信息可以在这里找到：（？） https://medium.com/@peterjgensler/functions-with-r-and-rvest-a-laymens-guide-acda42325a77

library('rvest') 
library('magrittr') 
url <- read_html("http://www.cbc.ca/news/politics") 
headlines <- url %>% 
html_nodes(xpath = '//*[contains(concat(" ", @class, " "), concat(" ", "pinnableHeadline", " "))]') %>% 
html_text() 
headlines[1] 
"On Trudeau's 2nd trip to China, time may be ripe to advance free 
trade" 
headlines[2] 
"Liberals want to be global leader on open government, but face complaints at home"

以上是未检测到html_node的Rvest节点的全部内容，来源链接： utcz.com/qa/257917.html

未检测到html_node的Rvest节点

回答：

其他人也看了：

为什么不使用表格在HTML中进行布局？

Npm postinstall仅在开发中

PHP套接字客户端没有返回完整响应

BeforeClass（Junit 4）和BeforeEach（Junit5）之间的区别

spring在过滤器中使用@Value批注