非常简单的代码在HtmlUnit中不起作用

我正在使用HtmlUnit 2.9(本月发布的稳定版本)。您是否知道以下代码为何不起作用?

public class Main {

public static void main(String[] args) {

WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6);

webClient.setCssEnabled(true);

webClient.setCssErrorHandler(new SilentCssErrorHandler());

webClient.setThrowExceptionOnFailingStatusCode(false);

webClient.setThrowExceptionOnScriptError(false);

webClient.setRedirectEnabled(false);

webClient.setAppletEnabled(false);

webClient.setJavaScriptEnabled(false);

webClient.setPopupBlockerEnabled(true);

webClient.setTimeout(60000);

webClient.setPrintContentOnFailingStatusCode(false);

System.out.println("This is printed on screen");

try {

webClient.getPage("http://www.2cash.info/index.php");

} catch (Exception e) {

e.printStackTrace();

}

System.out.println("This is NEVER printed on screen");

}

}

我还要添加jstack的结果。注意,我标记了一个不断重复的部分:

2011-08-26 03:15:45

Full thread dump Java HotSpot(TM) Server VM (20.1-b02 mixed mode):

"Attach Listener" daemon prio=10 tid=0x09520400 nid=0x5363 waiting on condition [0x00000000]

java.lang.Thread.State: RUNNABLE

"JS executor for com.gargoylesoftware.htmlunit.WebClient@a7c45e" daemon prio=10 tid=0x6feb7400 nid=0x5356 waiting on condition [0x6fcfe000]

java.lang.Thread.State: TIMED_WAITING (sleeping)

at java.lang.Thread.sleep(Native Method)

at com.gargoylesoftware.htmlunit.javascript.background.JavaScriptExecutor.run(JavaScriptExecutor.java:166)

at java.lang.Thread.run(Thread.java:662)

"Low Memory Detector" daemon prio=10 tid=0x70204c00 nid=0x5352 runnable [0x00000000]

java.lang.Thread.State: RUNNABLE

"C2 CompilerThread1" daemon prio=10 tid=0x70202800 nid=0x5351 runnable [0x00000000]

java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" daemon prio=10 tid=0x70200800 nid=0x5350 waiting on condition [0x00000000]

java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x09514c00 nid=0x534f runnable [0x00000000]

java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=10 tid=0x09503400 nid=0x534e in Object.wait() [0x70798000]

java.lang.Thread.State: WAITING (on object monitor)

at java.lang.Object.wait(Native Method)

- waiting on <0x76af2ff0> (a java.lang.ref.ReferenceQueue$Lock)

at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118)

- locked <0x76af2ff0> (a java.lang.ref.ReferenceQueue$Lock)

at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134)

at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)

"Reference Handler" daemon prio=10 tid=0x09501c00 nid=0x534d in Object.wait() [0x707e9000]

java.lang.Thread.State: WAITING (on object monitor)

at java.lang.Object.wait(Native Method)

- waiting on <0x7675cc58> (a java.lang.ref.Reference$Lock)

at java.lang.Object.wait(Object.java:485)

at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116)

- locked <0x7675cc58> (a java.lang.ref.Reference$Lock)

"main" prio=10 tid=0x09482400 nid=0x5349 runnable [0xb6c34000]

java.lang.Thread.State: RUNNABLE

at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.getSlot(ScriptableObject.java:2603)

at net.sourceforge.htmlunit.corejs.javascript.ScriptableObject.defineProperty(ScriptableObject.java:1699)

at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.configureConstantsPropertiesAndFunctions(JavaScriptEngine.java:350)

at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.configureClass(JavaScriptEngine.java:330)

at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.init(JavaScriptEngine.java:199)

at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.access$000(JavaScriptEngine.java:79)

at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$1.run(JavaScriptEngine.java:146)

at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:537)

at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:538)

at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.initialize(JavaScriptEngine.java:157)

at com.gargoylesoftware.htmlunit.WebClient.initialize(WebClient.java:1141)

at com.gargoylesoftware.htmlunit.WebWindowImpl.setEnclosedPage(WebWindowImpl.java:109)

at com.gargoylesoftware.htmlunit.html.FrameWindow.setEnclosedPage(FrameWindow.java:102)

at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:200)

at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:179)

at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:221)

at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:106)

at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:433)

at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311)

at com.gargoylesoftware.htmlunit.html.BaseFrame.<init>(BaseFrame.java:73)

at com.gargoylesoftware.htmlunit.html.HtmlInlineFrame.<init>(HtmlInlineFrame.java:46)

at com.gargoylesoftware.htmlunit.html.DefaultElementFactory.createElementNS(DefaultElementFactory.java:288)

at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.startElement(HTMLParser.java:506)

at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)

at org.cyberneko.html.HTMLTagBalancer.callStartElement(HTMLTagBalancer.java:1136)

at org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalancer.java:742)

at org.cyberneko.html.filters.DefaultFilter.startElement(DefaultFilter.java:136)

at org.cyberneko.html.filters.NamespaceBinder.startElement(NamespaceBinder.java:278)

at org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.java:2652)

at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2022)

at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:908)

at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)

at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)

at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)

at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:789)

at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:225)

at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:179)

at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:221)

at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:106)

at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:433)

at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311)

<THIS_SECTION_IS_PRINTED_AS_IF_IT_WERE_IN_A_LOOP>

at com.gargoylesoftware.htmlunit.html.BaseFrame.loadInnerPageIfPossible(BaseFrame.java:149)

at com.gargoylesoftware.htmlunit.html.BaseFrame.loadInnerPage(BaseFrame.java:99)

at com.gargoylesoftware.htmlunit.html.HtmlPage.loadFrames(HtmlPage.java:1760)

at com.gargoylesoftware.htmlunit.html.HtmlPage.initialize(HtmlPage.java:194)

at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:440)

at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311)

</THIS_SECTION_IS_PRINTED_AS_IF_IT_WERE_IN_A_LOOP>

at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:311)

at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:373)

at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:358)

at main.Main.<init>(Main.java:42)

at main.Main.main(Main.java:23)

"VM Thread" prio=10 tid=0x094fe000 nid=0x534c runnable

"GC task thread#0 (ParallelGC)" prio=10 tid=0x09489800 nid=0x534a runnable

"GC task thread#1 (ParallelGC)" prio=10 tid=0x0948ac00 nid=0x534b runnable

"VM Periodic Task Thread" prio=10 tid=0x70207000 nid=0x5353 waiting on condition

JNI global references: 1234

我认为关于自动加载框架存在某种循环。如果是这样,是否有任何方法可以禁用该行为以打破循环?

提前致谢!

回答:

好吧,尽管这是一个可怕的解决方案(实际上,解决方法……),但我最终决定按照HtmlUnit的开发人员之一的建议,在HtmlUnit中禁用自动加载帧。这是我详细做的:

  1. 下载了HtmlUnit源
  2. 从这里下载maven
  3. 注释了loadFrames()位于以下位置的HtmlPage类的方法的内容(方法的主体,而不是声明)htmlunit-2.9/src/main/java/com/gargoylesoftware/htmlunit/html
  4. 使用以下命令编译了此自定义代码跳过测试: mvn -Dmaven.test.skip=true clean compile package
  5. 找到新htmlunit-2.9.jarhtmlunit-2.9/artifacts并替换了当前htmlunit-2.9.jar库文件
  6. 这一步可能是最微妙的一步,因为它取决于每个应用程序。但是,我将向您展示我需要对应用程序进行的更改。

您知道我的原始代码如何(看问题)。这将从页面下载所有框架和iframe。我正在添加一个示例,说明如何获取仅包含所需框架的框架页面:

try {

HtmlPage page = webClient.getPage("http://www.w3schools.com/HTML/tryit.asp?filename=tryhtml_noframes");

HtmlInlineFrame frame = page.getFirstByXPath("//iframe[@name='view']");

page = webClient.getPage(page.getFullyQualifiedUrl(frame.getSrcAttribute()));

System.out.println(page.asXml());

} catch (Exception e) {

e.printStackTrace();

}

更改此库后,一旦getPage()方法完成,框架的内容将为空。注意它不会为空,看起来就像只是返回一个空框架。我们需要做的是手动下载我们感兴趣的框架的内容,这就是为什么我要getPage()再次执行。

好吧,这就是我设法使用HtmlUnit有选择地下载框架和iframe的方式。任何有关如何改善这一点的想法将不胜感激。无论如何,我希望将来会增加某种方式来禁用HtmlUnit本身中的帧加载,也许添加诸如之类的方法getPage(URL

url, boolean downloadFrames)

希望这可以帮助某人!

以上是 非常简单的代码在HtmlUnit中不起作用 的全部内容, 来源链接: utcz.com/qa/401975.html

回到顶部