通过已知单元格查找表格并检索前后单元格的内容

我正在使用一些快速脚本来读取表格数据。在页面上有多个表格,并且它们似乎是动态加载的,没有id的ajax能够使用xpath。我需要在日期之前的单元格以及单元格后面的单元格文本<td><span style="">First Last</span></td>,我知道这些单元格会被修复。我需要确定的是问题表。通过已知单元格查找表格并检索前后单元格的内容

<table cellspacing="0" class="collections"> 

<thead>

<tr>

<td colspan="4" class="actionsWrapper">

<table cellpadding="0" cellspacing="0" width="100%">

<thead></thead>

<tbody>

<tr>

<td><span style="display: none;"><b>Current Group: </b> <span><select class="standard_input"></select></span>&nbsp;&nbsp; </span><span>(<font color="red"><span>2</span></font> Notes)</span><span style="display: none;">&nbsp;&nbsp;&nbsp;<a href="javascript: void(null)"><font size="-2">Edit Group</font></a> | <span><a href="group_manager.php?type=12"><font id="create_group" size="-2">Create Group</font></a></span></span></td>

<td>

<div style="display: none;"><img src="include/images/loading_page.gif" height="70%"> <span style="font-size: .8em; font-weight: bold;">Retrieving Data...</span></div>

</td>

<td class="searchWrapper">

<table cellpadding="0" cellspacing="0">

<thead></thead>

<tbody>

<tr>

<td><input type="TEXT" class="keyword icon magnifying-glass unfocused"></td>

</tr>

<tr>

<td><span id="notesWrapper" style="display: none;"><label for="notesToggle">Search notes</label><input type="CHECKBOX" class="inpt_checkbox standard_input" id="notesToggle"></span></td>

</tr>

</tbody>

<tfoot></tfoot>

</table>

</td>

</tr>

</tbody>

<tfoot></tfoot>

</table>

</td>

</tr>

</thead>

<tbody>

<tr class="header">

<td class="utils"></td>

<td class="pointer bold" style="width: 200px;">Date</td>

<td class="pointer bold">Note</td>

<td class="pointer bold openArrow">Author</td>

</tr>

<tr class="data" style="cursor: default;">

<td class="actions"><input type="CHECKBOX" class="checkbox" style="display: none;"><a class="icon trashcan" title="Delete Note">Delete Note</a></td>

<td style="width: 200px;"><span style="">8/24/2011 12:00 PM</span></td>

<td><span style="">First Last</span></td>

<td><span style="">No answer - went to answering machine</span></td>

</tr>

<tr class="detailWrapper" style="display: none;"></tr>

<tr class="data" style="cursor: default;">

<td class="actions"><input type="CHECKBOX" class="checkbox" style="display: none;"><a class="icon trashcan" title="Delete Note">Delete Note</a></td>

<td style="width: 200px;"><span style="">8/26/2011 11:08 AM</span></td>

<td><span style="">First Last</span></td>

<td><span style="">Philip hardly comes into this store</span></td>

</tr>

<tr class="detailWrapper" style="display: none;"></tr>

</tbody>

<tfoot>

<tr style="display: none;"></tr>

<tr>

<td colspan="4">

<table width="100%" style="margin-top:5px;">

<tbody>

<tr>

<td align="left">

<div class="navigationPanel" style="display: none;"><a style="color: rgb(156, 156, 155); cursor: default;">&lt;&lt;</a> <a style="color: rgb(156, 156, 155); cursor: default;">&lt;</a> Page: <input type="TEXT" class="inpt_text standard_input" size="2"><span> of 1 </span> <a style="cursor: default; color: rgb(156, 156, 155);">&gt;</a> <a style="cursor: default; color: rgb(156, 156, 155);">&gt;&gt;</a></div>

</td>

<td align="right">

Entries Per Page:

<select>

<option value="10" selected="">10</option>

<option value="25">25</option>

<option value="50">50</option>

</select>

</td>

</tr>

</tbody>

</table>

</td>

</tr>

<tr>

<td colspan="4" align="left" style="margin-left: 2px;"><textarea style="width: 70%;"></textarea><input type="BUTTON" class="btn2" value="Add" style="width: 50px; margin-left: 10px;"></td>

</tr>

<tr style="display: none;">

<td colspan="4" class="groupActionsWrapper">

<div class="stepbar">Group Actions</div>

<br>

<table width="100%">

<tbody>

<tr>

<td style="padding-top:2px;width: 50px" align="right" valign="top">With </td>

<td style="width: 100px;" align="left" valign="top">

<select>

<option value="0">Selected</option>

<option value="1">All in group</option>

</select>

</td>

<td></td>

</tr>

</tbody>

</table>

</td>

</tr>

</tfoot>

</table>

回答:

这里的很多方法可以做到这一个,如果你不熟悉的模块re和/或html.parser:

line_prev = '' 

with open('29740695.htm') as f:

for line in f:

if line != ' <td><span style="">First Last</span></td>\n':

line_prev = line

continue

print(line_prev)

print(f.readline())

以上是 通过已知单元格查找表格并检索前后单元格的内容 的全部内容, 来源链接: utcz.com/qa/264956.html

回到顶部