如何在Java中解析HTML字符串?

给定字符串"<table><tr><td>Hello

World!</td></tr></table>",获取代表它的DOM元素的(最简单)方法是什么?

回答:

我在某个地方找到了(不记得在哪里):

 public static DocumentFragment parseXml(Document doc, String fragment)

{

// Wrap the fragment in an arbitrary element.

fragment = "<fragment>"+fragment+"</fragment>";

try

{

// Create a DOM builder and parse the fragment.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

Document d = factory.newDocumentBuilder().parse(

new InputSource(new StringReader(fragment)));

// Import the nodes of the new document into doc so that they

// will be compatible with doc.

Node node = doc.importNode(d.getDocumentElement(), true);

// Create the document fragment node to hold the new nodes.

DocumentFragment docfrag = doc.createDocumentFragment();

// Move the nodes into the fragment.

while (node.hasChildNodes())

{

docfrag.appendChild(node.removeChild(node.getFirstChild()));

}

// Return the fragment.

return docfrag;

}

catch (SAXException e)

{

// A parsing error occurred; the XML input is not valid.

}

catch (ParserConfigurationException e)

{

}

catch (IOException e)

{

}

return null;

}

以上是 如何在Java中解析HTML字符串? 的全部内容, 来源链接: utcz.com/qa/418826.html

回到顶部