使用Selenium / PhantomJS进行网络捕获

Z时代
2024-01-10
分类：问答

我想捕获到我正在浏览使用python的Selenium的网站的访问量，因为使用代理将使访问量达到https的水平，这将不会使我走远。

我的想法是使用selenium来运行phantomJS并使用phantomJS执行脚本（不是在页面上使用webdriver.execute_script（），而是在phantomJS本身上）。我在想netlog.js脚本（从这里https://github.com/ariya/phantomjs/blob/master/examples/netlog.js）。

由于它在命令行中像这样工作

phantomjs --cookies-file=/tmp/foo netlog.js https://google.com

selenium必须有类似的方法吗？

提前致谢

用browsermob-proxy解决了。

pip3 install browsermob-proxy

Python3代码

from selenium import webdriver
from browsermobproxy import Server
server = Server(<path to browsermob-proxy>)
server.start()
proxy = server.create_proxy({'captureHeaders': True, 'captureContent': True, 'captureBinaryContent': True})
service_args = ["--proxy=%s" % proxy.proxy, '--ignore-ssl-errors=yes']
driver = webdriver.PhantomJS(service_args=service_args)
proxy.new_har()
driver.get('https://google.com')
print(proxy.har)  # this is the archive
# for example:
all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]

回答：

我为此使用代理

from selenium import webdriver
from browsermobproxy import Server
server = Server(environment.b_mob_proxy_path)
server.start()
proxy = server.create_proxy()
service_args = ["--proxy-server=%s" % proxy.proxy]
driver = webdriver.PhantomJS(service_args=service_args)
proxy.new_har()
driver.get('url_to_open')
print proxy.har  # this is the archive
# for example:
all_requests = [entry['request']['url'] for entry in proxy.har['log']['entries']]

“ har”（http存档格式）还有许多有关请求和响应的其他信息，这对我非常有用

在Linux上安装：

pip install browsermob-proxy

以上是使用Selenium / PhantomJS进行网络捕获的全部内容，来源链接： utcz.com/qa/411364.html