配置browsermobproxy

由于selenium有一定的缺陷,所以可以使用了一个能抓取动态数据的代理服务器,https://github.com/lightbody/browsermob-proxy, 但已经四年没有更新了,这里用的是python版本,应该也是很久没有更新了

安装python包

pip install browsermob-proxy

下载browsermobproxy代理服务器文件

https://github.com/lightbody/browsermob-proxy/releases/tag/browsermob-proxy-2.1.4

使用方法

from browsermobproxy import Server
server = Server(r'browsermob-proxy-2.1.4\bin\browsermob-proxy.bat')
server.start()
proxy = server.create_proxy(params={'trustAllServers':'true'})   # 有些时间需要加上这个参数,否则可能页面无法打开

上面这段是启动代理代理服务器

selenium和browsermobproxy结合使用

from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from selenium.webdriver.chrome.options import Options

# # # todo firefox
# profile = FirefoxProfile()
# option = webdriver.FirefoxOptions()
# # option.add_argument("--headless")   #无界面模式
# # option.add_argument("--disable-gpu")
# option.add_argument('--proxy-server={0}'.format(proxy.proxy))
# driver = webdriver.Firefox(profile, firefox_options=option, executable_path=r'geckodriver-v0.26.0-win64\geckodriver.exe')

#
# profile  = webdriver.FirefoxProfile()
# profile.set_proxy(proxy.selenium_proxy())
# driver = webdriver.Firefox(firefox_profile=profile,executable_path=r'geckodriver-v0.26.0-win64\geckodriver.exe')
#


# # # # todo chrome
chrome_options = Options()
chrome_options.add_argument("--headless")   #无界面模式
# chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument('--proxy-server={0}'.format(proxy.proxy))
chrome_options.add_experimental_option("useAutomationExtension", False)
chrome_options.add_experimental_option("excludeSwitches", ['enable-automation'])
chrome_options.add_argument('--ignore-certificate-errors')
# driver = webdriver.Chrome(chrome_options=chrome_options,executable_path='C:/chromedriver_win32/chromedriver.exe')
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path='chromedriver.exe')

proxy.new_har("", options={'captureContent': True, 'captureHeaders': True, 'captureBinaryContent': True})  # 最后一个参数很重要,有些结果需要配置这个参数

driver.get(url)
proxy.wait_for_traffic_to_stop(1, 60)
result = proxy.har
print result   # 分析result结果

参考:

1:https://stackoverflow.com/questions/49832373/cant-get-response-body-in-har-browsermobproxy-selenium-firefox-in-python
2:https://github.com/lightbody/browsermob-proxy/issues/741
3:https://browsermob-proxy-py.readthedocs.io/en/stable/

Logo

瓜分20万奖金 获得内推名额 丰厚实物奖励 易参与易上手

更多推荐