JAVA使用selenium实现后台浏览器截图（含代码+docker打包）

最近接到一个特殊需求，需要每天定时截图某网站首页，保存备查，因此使用JAVA实现了后台截图。下面是完整的方法。

weixin_44670774

3444人浏览 · 2024-03-09 10:45:00

weixin_44670774 · 2024-03-09 10:45:00 发布

最近接到一个特殊需求，需要每天定时截图某网站首页，保存备查，因此使用JAVA实现了后台截图。

下面是完整的方法。

技术方案

JAVA实现浏览器截图，浏览器选用的是chrome，使用selenium-java依赖。selenium可以帮助我们调用浏览器，完成想要的功能。
由于是后台截图，不需要显示界面，chrome也支持不显示界面的调用。

selenium使用的是较新的版本4.18.1 (截止2024-3-8）。
java版本为21（应该是11以及后，或者8也可以）
使用了hutool工具包。

前置准备

首先，我们在windows上开发，需要下载最新的chrome浏览器和他对应的driver，下载地址为：
https://chromedriver.chromium.org/downloads
根据下载界面的介绍，chrome115版本及以后的，下载地址为：
https://googlechromelabs.github.io/chrome-for-testing/

下载自己架构的windows版本 chromedriver。

JAVA代码

pom:

<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java -->
<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>4.18.1</version>
</dependency>
            <dependency>
                <groupId>cn.hutool</groupId>
                <artifactId>hutool-all</artifactId>
                <version>5.8.26</version>
            </dependency>

java是使用spring开发的service，支持读取spring配置文件，截出来的bytes数组是png格式的图片，可以自己直接保存成文件，或者base64发到其他服务。代码如下：

/**
 */
public interface ChromeService {

    /**
     * 对指定界面截图
     * @param url 要截图的url
     * @return 图片二进制数据
     */
    byte[] screenshot(String url);
}


import lombok.Data;
import org.springframework.boot.context.properties.ConfigurationProperties;

/**
 */
@ConfigurationProperties("myservice.chrome")
@Data
public class ChromeProperties {
    private String driverPath;
    private int width;
    private int height;
}


import cn.hutool.core.io.resource.ClassPathResource;
import cn.hutool.core.thread.ThreadUtil;
import  xxx.ChromeService;
import jakarta.annotation.Resource;
import lombok.extern.slf4j.Slf4j;
import org.openqa.selenium.Dimension;
import org.openqa.selenium.OutputType;
import org.openqa.selenium.chrome.ChromeDriver;
import org.openqa.selenium.chrome.ChromeDriverService;
import org.openqa.selenium.chrome.ChromeOptions;
import org.springframework.boot.context.properties.EnableConfigurationProperties;
import org.springframework.stereotype.Service;

import java.io.File;
import java.io.IOException;
import java.nio.file.Paths;
import java.time.Duration;
import java.util.concurrent.TimeUnit;

/**
 */
@Service
@Slf4j
@EnableConfigurationProperties(ChromeProperties.class)
public class ChromeServiceImpl implements ChromeService {

    @Resource
    private ChromeProperties chromeProperties;

    public ChromeDriverService getService()  {
        // 利用ChromeDriverService启动
        // Windows对应chromedriver.exe Linux对应chromedriver
        // linux下的chromedriver 需要有执行权限！
        File driverFile;
        if (chromeProperties.getDriverPath().startsWith("classpath")){
            driverFile = new ClassPathResource(chromeProperties.getDriverPath()).getFile();
        }else {
            // 常规文件
            driverFile = Paths.get(chromeProperties.getDriverPath()).toFile();
        }

        ChromeDriverService service = new ChromeDriverService.Builder().usingDriverExecutable(driverFile).usingAnyFreePort().build();
        try {
            service.start();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
        return service;
    }


    @Override
    public byte[] screenshot(String url) {
        ChromeDriverService service = getService();
        ChromeDriver driver = getDriver(service);
        try {
            //设置需要访问的地址
            log.info("chrome 截图： {}", url);
            driver.get(url);
            ThreadUtil.sleep(5, TimeUnit.SECONDS);
            //设置窗口宽高，设置后才能截全
            driver.manage().window().setSize(new Dimension(chromeProperties.getWidth(), chromeProperties.getHeight()));
            return driver.getScreenshotAs(OutputType.BYTES);
        } finally {
            driver.quit();
            service.stop();
            log.info("chrome 实例 销毁成功！");
        }
    }

    private ChromeDriver getDriver(ChromeDriverService service) {
        ChromeOptions options = new ChromeOptions();
        options.setAcceptInsecureCerts(true);

        //设置浏览器参数
        options.addArguments("--no-sandbox");
        options.addArguments("--disable-gpu");
        options.addArguments("--disable-dev-shm-usage");
        options.addArguments("--headless=new");
        ChromeDriver driver = new ChromeDriver(service, options);
        //设置超时，避免有些内容加载过慢导致截不到图
        driver.manage().timeouts()
                .pageLoadTimeout(Duration.ofMinutes(1))
                .implicitlyWait(Duration.ofMinutes(1))
                .scriptTimeout(Duration.ofMinutes(1));
        log.info("chrome driver 初始化成功！");
        return driver;
    }
}

使用时，先在application.yaml中配置chromedriver的地址：

myservice:
  chrome:
    driver-path: 'classpath:/chromedriver.exe'
    width: 1366
    height: 768

在windows下，chromedriver.exe可以随便放到哪个目录，或者直接放到resources下，配置好路径即可。
在linux下，要放到jar包外面，并授予可执行权限，否则无法调用。

width和height是要截图的浏览器宽高，可以按照自己屏幕的分辨率自行调整。

打包docker

这里把基础镜像的制作方法放出来。jar包放到基础镜像中的操作就不再赘述了。

基础镜像包含jdk和chrome软件。

首先，在 https://pkgs.org/download/google-chrome-stable 下载chrome安装包。
下载最新的即可。
按照安装包版本，去上面的chrome-driver下载对应的chromedriver。

下面是下载好以后的目录结构。
chromedriver是解压出来的；由于我选的jdk镜像支持的是deb，所以这里面的rpm可以不需要，按照自己的实际情况下载即可。
在这里插入图片描述
dockerfile如下，安装了一些额外的工具包，其中language-pack-zh-hans fonts-wqy-zenhei这俩包安装后才可以正常显示中文，否则截出来的图里面的中文会是空框框。

FROM azul/zulu-openjdk:21.0.2-jre
ADD google-chrome-stable_current_amd64.deb /google-chrome-stable_current_amd64.deb
RUN apt-get update
RUN apt-get install -y  curl
RUN apt-get install -y sudo
RUN apt-get install /google-chrome-stable_current_amd64.deb -y
RUN apt-get install language-pack-zh-hans fonts-wqy-zenhei -y
ADD chromedriver /usr/bin/chromedriver

结束。