通过已知单元格查找表格并检索前后单元格的内容
我正在使用一些快速脚本来读取表格数据。在页面上有多个表格,并且它们似乎是动态加载的,没有id的ajax能够使用xpath。我需要在日期之前的单元格以及单元格后面的单元格文本<td><span style="">First Last</span></td>
,我知道这些单元格会被修复。我需要确定的是问题表。通过已知单元格查找表格并检索前后单元格的内容
<table cellspacing="0" class="collections"> <thead>
<tr>
<td colspan="4" class="actionsWrapper">
<table cellpadding="0" cellspacing="0" width="100%">
<thead></thead>
<tbody>
<tr>
<td><span style="display: none;"><b>Current Group: </b> <span><select class="standard_input"></select></span> </span><span>(<font color="red"><span>2</span></font> Notes)</span><span style="display: none;"> <a href="javascript: void(null)"><font size="-2">Edit Group</font></a> | <span><a href="group_manager.php?type=12"><font id="create_group" size="-2">Create Group</font></a></span></span></td>
<td>
<div style="display: none;"><img src="include/images/loading_page.gif" height="70%"> <span style="font-size: .8em; font-weight: bold;">Retrieving Data...</span></div>
</td>
<td class="searchWrapper">
<table cellpadding="0" cellspacing="0">
<thead></thead>
<tbody>
<tr>
<td><input type="TEXT" class="keyword icon magnifying-glass unfocused"></td>
</tr>
<tr>
<td><span id="notesWrapper" style="display: none;"><label for="notesToggle">Search notes</label><input type="CHECKBOX" class="inpt_checkbox standard_input" id="notesToggle"></span></td>
</tr>
</tbody>
<tfoot></tfoot>
</table>
</td>
</tr>
</tbody>
<tfoot></tfoot>
</table>
</td>
</tr>
</thead>
<tbody>
<tr class="header">
<td class="utils"></td>
<td class="pointer bold" style="width: 200px;">Date</td>
<td class="pointer bold">Note</td>
<td class="pointer bold openArrow">Author</td>
</tr>
<tr class="data" style="cursor: default;">
<td class="actions"><input type="CHECKBOX" class="checkbox" style="display: none;"><a class="icon trashcan" title="Delete Note">Delete Note</a></td>
<td style="width: 200px;"><span style="">8/24/2011 12:00 PM</span></td>
<td><span style="">First Last</span></td>
<td><span style="">No answer - went to answering machine</span></td>
</tr>
<tr class="detailWrapper" style="display: none;"></tr>
<tr class="data" style="cursor: default;">
<td class="actions"><input type="CHECKBOX" class="checkbox" style="display: none;"><a class="icon trashcan" title="Delete Note">Delete Note</a></td>
<td style="width: 200px;"><span style="">8/26/2011 11:08 AM</span></td>
<td><span style="">First Last</span></td>
<td><span style="">Philip hardly comes into this store</span></td>
</tr>
<tr class="detailWrapper" style="display: none;"></tr>
</tbody>
<tfoot>
<tr style="display: none;"></tr>
<tr>
<td colspan="4">
<table width="100%" style="margin-top:5px;">
<tbody>
<tr>
<td align="left">
<div class="navigationPanel" style="display: none;"><a style="color: rgb(156, 156, 155); cursor: default;"><<</a> <a style="color: rgb(156, 156, 155); cursor: default;"><</a> Page: <input type="TEXT" class="inpt_text standard_input" size="2"><span> of 1 </span> <a style="cursor: default; color: rgb(156, 156, 155);">></a> <a style="cursor: default; color: rgb(156, 156, 155);">>></a></div>
</td>
<td align="right">
Entries Per Page:
<select>
<option value="10" selected="">10</option>
<option value="25">25</option>
<option value="50">50</option>
</select>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
<tr>
<td colspan="4" align="left" style="margin-left: 2px;"><textarea style="width: 70%;"></textarea><input type="BUTTON" class="btn2" value="Add" style="width: 50px; margin-left: 10px;"></td>
</tr>
<tr style="display: none;">
<td colspan="4" class="groupActionsWrapper">
<div class="stepbar">Group Actions</div>
<br>
<table width="100%">
<tbody>
<tr>
<td style="padding-top:2px;width: 50px" align="right" valign="top">With </td>
<td style="width: 100px;" align="left" valign="top">
<select>
<option value="0">Selected</option>
<option value="1">All in group</option>
</select>
</td>
<td></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tfoot>
</table>
回答:
这里的很多方法可以做到这一个,如果你不熟悉的模块re和/或html.parser:
line_prev = '' with open('29740695.htm') as f:
for line in f:
if line != ' <td><span style="">First Last</span></td>\n':
line_prev = line
continue
print(line_prev)
print(f.readline())
以上是 通过已知单元格查找表格并检索前后单元格的内容 的全部内容, 来源链接: utcz.com/qa/264956.html