在这里插入图片描述

一. 环境

  • WSL ubuntu 22.04。未安装WSL的可参考 WSL 安装
  • nvidia 驱动正常。wsl 终端执行 nvidia-smi
nvidia-smi
Sun Oct  6 20:37:01 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.58.02              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4080 ...    On  |   00000000:01:00.0 Off |                  N/A |
| N/A   40C    P3             19W /   95W |       0MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

二、WSL NVIDIA docker 安装

1. 安装 CUDA Toolkit

cuda

  • 选择 【Linux】-> 【x86_64】->【WSL-Ubuntu】->【2.0】
  • Installer Type 任意即可,终端键入CUDA安装界面自动生成的安装指令即可
  • 安装完成,配置环境CUDA 安装路径到 ~/.bashrc环境变量中(CUDA默认安装路径为usr/local,cuda-11.8替换为安装cuda对应的版本)
export PATH=/usr/local/cuda-11.8/bin::$PATH
  • CUDA Toolkit 安装成功。终端输入nvcc --version正常输出
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

2. 安装 Docker

  • 编辑/etc/wsl.conf启用WSLsystemd功能,便于后续docker服务自启
[boot]
systemd=true
  • 卸载旧版本 Docker
sudo apt-get purge docker-ce docker-ce-cli containerd.io docker-compose-plugin
  • 安装docker
curl https://get.docker.com | sh
  • 查看docker 服务状态
sudo systemctl status docker

正常激活

  • 如果docker服务没有运行,重启docker 服务
sudo systemctl restart  docker

3. 安装 nvidia docker

  • 配置nvidia-docker源
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
  • 安装nvidia-docker
sudo apt-get update
sudo apt-get install -y nvidia-docker2

如遇到nvidia.github.io 无法访问,修改DNS服务器

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey
curl: (7) Failed to connect to nvidia.github.io port 443 after 1 ms: Connection refused
  • 编辑/etc/resolv.conf, 修改nameserver114.114.114.114
# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateResolvConf = false
# nameserver 10.255.255.254
nameserver 114.114.114.114
search lan

4. 配置 Docker

  • 配置docker镜像仓库源及docker runtimes,编辑/etc/docker/daemon.json,配置如下
{
    "registry-mirrors": [
        "https://dockerproxy.cn",
        "https://docker.rainbond.cc",
        "https://docker.udayun.com",
        "https://docker.rainbond.cc",
        "https://hub.uuuadc.top",
        "https://docker.anyhub.us.kg",
        "https://dockerhub.jobcher.com",
        "https://dockerhub.icu",
        "https://docker.ckyl.me",
        "https://docker.awsl9527.cn"
    ],
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    },
    "default-runtime": "nvidia"
}
字段名描述
registry-mirrors配置docker 仓库镜像,解决某些网络无法拉取docker镜像问题
runtimesDocker can use the NVIDIA Container Runtime. 参考container-toolkit/latest/install-guide.html
  • 修改/etc/nvidia-container-runtime/config.toml字段 /no-cgroups = truefalse
sudo sed -i 's/no-cgroups = true/no-cgroups = false/' /etc/nvidia-container-runtime/config.toml

解决docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark找不到设备问题(Error: only 0 Devices available, 1 requested. Exiting.参考链接: Docker container with CUDA does not see my GPU )

5. 重启 Docker 服务

sudo systemctl restart  docker

三、NVIDIA docker 验证

  • N-body simulation container 验证 wsl docker gpu 是否配置成功
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

配置成功后正常输出

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
        -fullscreen       (run n-body simulation in fullscreen mode)
        -fp64             (use double precision floating point values for simulation)
        -hostmem          (stores simulation data in host memory)
        -benchmark        (run benchmark to measure performance)
        -numbodies=<N>    (number of bodies (>= 1) to run in simulation)
        -device=<d>       (where d=0,1,2.... for the CUDA device to use)
        -numdevices=<i>   (where i=(number of CUDA devices > 0) to use for simulation)
        -compare          (compares simulation results running once on the default GPU and once on the CPU)
        -cpu              (run n-body simulation on the CPU)
        -tipsy=<file.bin> (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
MapSMtoCores for SM 8.9 is undefined.  Default to use 128 Cores/SM
MapSMtoArchName for SM 8.9 is undefined.  Default to use Ampere
GPU Device 0: "Ampere" with compute capability 8.9

> Compute 8.9 CUDA device: [NVIDIA GeForce RTX 4080 Laptop GPU]
59392 bodies, total time for 10 iterations: 52.937 ms
= 666.345 billion interactions per second
= 13326.896 single-precision GFLOP/s at 20 flops per interaction

四、结束语

如安装遇失败的话,敬请留言。

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐