如何在Java中解析HTML字符串?
给定字符串"<table><tr><td>Hello
World!</td></tr></table>",获取代表它的DOM元素的(最简单)方法是什么?
回答:
我在某个地方找到了(不记得在哪里):
public static DocumentFragment parseXml(Document doc, String fragment) {
// Wrap the fragment in an arbitrary element.
fragment = "<fragment>"+fragment+"</fragment>";
try
{
// Create a DOM builder and parse the fragment.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
Document d = factory.newDocumentBuilder().parse(
new InputSource(new StringReader(fragment)));
// Import the nodes of the new document into doc so that they
// will be compatible with doc.
Node node = doc.importNode(d.getDocumentElement(), true);
// Create the document fragment node to hold the new nodes.
DocumentFragment docfrag = doc.createDocumentFragment();
// Move the nodes into the fragment.
while (node.hasChildNodes())
{
docfrag.appendChild(node.removeChild(node.getFirstChild()));
}
// Return the fragment.
return docfrag;
}
catch (SAXException e)
{
// A parsing error occurred; the XML input is not valid.
}
catch (ParserConfigurationException e)
{
}
catch (IOException e)
{
}
return null;
}
以上是 如何在Java中解析HTML字符串? 的全部内容, 来源链接: utcz.com/qa/418826.html