使用VBA进行Web抓取(当HTML <> DOM时)

我在抓取该特定网页的数据时经历了可怕的时间……

基本上,当我

在浏览器中加载URL并手动按F12键时,但是当我以编程方式

尝试执行此操作时,我可以在“ DOM Explorer”中看到所需的信息(参见下文)HTMLDoc不包含

我在“ DOM资源管理器”中看到的相同信息…

Public Sub testCode()

Dim IE As SHDocVw.InternetExplorer

Dim HTMLDoc As MSHTML.HTMLDocument

Set IE = New SHDocVw.InternetExplorer

With IE

.navigate "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW"

While .Busy = True Or .ReadyState <> READYSTATE_COMPLETE: Wend

Set HTMLDoc = .Document

End With

End Sub

有人可以帮我访问“ DOM资源管理器”中的信息吗?我知道HTML并不总是您在浏览器中看到的,而是在浏器中创建可见内容的说明,但是必须有一种方法可以从HTML以编程方式创建DOM……

另外,我相信我是由脚本或生成后的数据iFrame中,但我一直无法生成我要寻找的数据从

瞎搞有两种....

回答:

大纲:

  • 使GET XHR到https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW。

  • 从HTML响应行中提取位置,var query = ‘zmw:’ + ‘00000.271.03969’;并从这些行中提取键var citypage_options = {k: ‘c991975b7f4186c0’, …。

  • 使用位置让GET XHR 00000.271.03969和密钥c991975b7f4186c0来https://api-ak-aws.wunderground.com/api/c991975b7f4186c0/forecast10day/hourly10day/labels/conditions/astronomy10day/lang:zh-CN/units:metric/v:2.0/bestfct:1/q/zmw:00000.271。 03969.json。

  • 使用解析JSON响应(例如,使用VBAJSON解析器),使用Parse()转换所需的数据ToArray(),并作为表格输出到工作表。

实际上,每次打开该网页时,Web浏览器都会执行几乎相同的操作。

您可以使用下面的VBA代码来解析响应并输出结果。将JSON.bas模块导入VBA项目以进行JSON处理。

Sub TestScrapeWunderground()

Dim sContent As String

Dim sKey As String

Dim sLocation As String

Dim vJSON As Variant

Dim sState As String

Dim oDays As Object

Dim oHours As Object

Dim vDay As Variant

Dim vHour As Variant

Dim aRows() As Variant

Dim aHeader() As Variant

' GET XHR to retrieve location and key

With CreateObject("MSXML2.ServerXMLHTTP")

.Open "GET", "https://www.wunderground.com/cgi-bin/findweather/getForecast?query=EIDW", False

.Send

sContent = .responseText

End With

' Extract location and key from HTML content

sLocation = Split(Split(sContent, "var query = 'zmw:' + '", 2)(1), "'", 2)(0)

sKey = Split(Split(sContent, vbTab & "k: '", 2)(1), "'", 2)(0)

' GET XHR to retrieve JSON data

With CreateObject("MSXML2.ServerXMLHTTP")

.Open "GET", "https://api-ak-aws.wunderground.com/api/" & sKey & "/forecast10day/hourly10day/labels/conditions/astronomy10day/lang:en/units:metric/v:2.0/bestfct:1/q/zmw:" & sLocation & ".json", False

.Send

sContent = .responseText

End With

' Parse JSON response to data structure

JSON.Parse sContent, vJSON, sState

' Populate dictionaries with daily and hourly forecast data

Set oDays = CreateObject("Scripting.Dictionary")

Set oHours = CreateObject("Scripting.Dictionary")

For Each vDay In vJSON("forecast")("days")

oDays(vDay("summary")) = ""

For Each vHour In vDay("hours")

oHours(vHour) = ""

Next

Next

' Convert daily forecast data to arrays

JSON.ToArray oDays.Keys(), aRows, aHeader

' Output daily forecast data to table

With Sheets(1)

.Cells.Delete

OutputArray .Cells(1, 1), aHeader

Output2DArray .Cells(2, 1), aRows

.Columns.AutoFit

End With

' Convert hourly forecast data to arrays

JSON.ToArray oHours.Keys(), aRows, aHeader

' Output hourly forecast data to table

With Sheets(2)

.Cells.Delete

OutputArray .Cells(1, 1), aHeader

Output2DArray .Cells(2, 1), aRows

.Columns.AutoFit

End With

' Convert response data to arrays

JSON.ToArray Array(vJSON("response")), aRows, aHeader

' Output response transposed data to table

With Sheets(3)

.Cells.Delete

Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)

Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)

.Columns.AutoFit

End With

' Convert current data to arrays

JSON.ToArray Array(vJSON("current_observation")), aRows, aHeader

' Output current transposed data to table

With Sheets(4)

.Cells.Delete

Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)

Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)

.Columns.AutoFit

End With

' Populate dictionary with daily astronomy data

Set oDays = CreateObject("Scripting.Dictionary")

For Each vDay In vJSON("astronomy")("days")

oDays(vDay) = ""

Next

' Convert daily astronomy data to arrays

JSON.ToArray oDays.Keys(), aRows, aHeader

' Output daily astronomy transposed data to table

With Sheets(5)

.Cells.Delete

Output2DArray .Cells(1, 1), WorksheetFunction.Transpose(aHeader)

Output2DArray .Cells(1, 2), WorksheetFunction.Transpose(aRows)

.Columns.AutoFit

End With

' Convert hourly history data to arrays

JSON.ToArray vJSON("history")("days")(0)("hours"), aRows, aHeader

' Output hourly history data to table

With Sheets(6)

.Cells.Delete

OutputArray .Cells(1, 1), aHeader

Output2DArray .Cells(2, 1), aRows

.Columns.AutoFit

End With

MsgBox "Completed"

End Sub

Sub OutputArray(oDstRng As Range, aCells As Variant)

With oDstRng

.Parent.Select

With .Resize( _

1, _

UBound(aCells) - LBound(aCells) + 1)

.NumberFormat = "@"

.Value = aCells

End With

End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

With oDstRng

.Parent.Select

With .Resize( _

UBound(aCells, 1) - LBound(aCells, 1) + 1, _

UBound(aCells, 2) - LBound(aCells, 2) + 1)

.NumberFormat = "@"

.Value = aCells

End With

End With

End Sub

第二个XHR返回JSON数据,以清楚说明如何从其中提取必要的数据,您可以将JSON保存到文件中,复制内容并将其粘贴到任何JSON查看器中以进行进一步研究。我使用在线工具

http://jsonviewer.stack.hu,

有6个主要部分,提取了数据的相关部分并将其输出到6个工作表(必须在运行之前手动创建):

Sheet1 - Daily forecast

Sheet2 - Horly forecast

Sheet3 - Response data (transposed)

Sheet4 - Current data (transposed)

Sheet5 - Astronomy (transposed)

Sheet6 - Hourly history data

有了该示例,您可以从该JSON响应中提取所需的数据。

以上是 使用VBA进行Web抓取(当HTML &lt;&gt; DOM时) 的全部内容, 来源链接: utcz.com/qa/420613.html

回到顶部