导师配了一个台式机,便着手配置PyTorch环境。根据台式机的显卡驱动(472.12)、CUDA、cuDNN版本安装好PyTorch之后,调用torch.cuda.is_available()函数,可以发现PyTorch-GPU版本已经安装成功。

    import torch

    print(torch.__version__)
    print(torch.cuda.is_available())

    # 1.10.1
    # True

但是安装的PyTorch却无法调用GPU进行运算

    a = torch.Tensor(5,3)
    print(a)
    a.cuda()

    # tensor([[1.0194e-38, 9.6429e-39, 9.2755e-39],
    #         [9.1837e-39, 9.3674e-39, 1.0745e-38],
    #         [1.0653e-38, 9.5510e-39, 1.0561e-38],
    #         [1.0194e-38, 1.1112e-38, 1.0561e-38],
    #         [9.9184e-39, 1.0653e-38, 4.1327e-39]])

    RuntimeError: CUDA error: no kernel image is available for execution on the device
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Jupyter Notebook还提示我们:

    D:\Anaconda3\lib\site-packages\torch\cuda\__init__.py:83: UserWarning: 
        Found GPU%d %s which is of cuda capability %d.%d.
        PyTorch no longer supports this GPU because it is too old.
        The minimum cuda capability supported by this library is %d.%d.

      warnings.warn(old_gpu_warn.format(d, name, major, minor, min_arch // 10, min_arch % 10))

PyTorch no longer supports this GPU because it is too old
PyTorch no longer supports this GPU because it is too old. 我们的GPU型号比较旧(GeForce GT 730,2G显存,算力3.5),现在的PyTorch已经不支持了。

PyTorch安装成功,但不能使用GPU功能:PyTorch no longer supports this GPU because it is too old. 及CUDA error: no kernel image is available for execution on the device

1.根据Python的提示内容进行修改

在按照Python的提示设置CUDA_LAUNCH_BLOCKING=1,即禁用所有cuda应用程序异步执行,仍然不能正常使用GPU进行运算

    import os
    os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

    a = torch.Tensor(5,3)
    print(a)
    a.cuda()

    # tensor([[1.0194e-38, 9.6429e-39, 9.2755e-39],
    #         [9.1837e-39, 9.3674e-39, 1.0745e-38],
    #         [1.0653e-38, 9.5510e-39, 1.0561e-38],
    #         [1.0194e-38, 1.1112e-38, 1.0561e-38],
    #         [9.9184e-39, 1.0653e-38, 4.1327e-39]])

    RuntimeError: CUDA error: no kernel image is available for execution on the device
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

2.降低PyTorch版本

由于Jupyter Notebook提示“当前的PyTorch已经不支持我们的GPU”,故可以尝试降低PyTorch版本。

但将PyTorch版本由1.10.1降到1.9.1、1.9.0、1.8.0后,Python仍然报出相同的错误提示,无法调用GPU进行运算。在添加了conda的下载镜像源之后,使用conda install来下载PyTorch依然非常慢(至少需要4h,且安装过程可能中断,此处省略n天……),故我们采用离线的方式来安装PyTorch。

在清华镜像网站中可以下载cudatoolkitpytorch、torchvision、torchaudio的离线安装包。通过conda install --use-local安装离线安装包(在tar.brz文件的下载目录中运行)

    conda install --use-local pytorch-1.7.1-py3.9_cuda110_cudnn8_0.tar.bz2
    conda install --use-local cudatoolkit-11.0.221-h74a9793_0.tar.bz2
    conda install --use-local torchvision-0.8.2-py39_cu110.tar.bz2
    conda install -c anaconda torchaudio==0.7.2 // 有些包在conda默认的channels中不包含,比如cudatoolkit-8.0,cudnn等,这时只需要在conda install指令后加上-c anaconda即可
    conda install --use-local torchvision-0.8.2-py39_cu110.tar.bz2 // torchvision的版本变成了0.2.2

由于有些项目还不支持最新的Python3.9,故新建一个基于Python3.7的环境(便于以后使用),同时安装对应版本的CUDA、cuDNN,并不断对PyTorch降级。

    conda install --use-local pytorch-1.6.0-py3.7_cuda102_cudnn7_0.tar.bz2
    conda install --use-local cudatoolkit-10.2.89-h74a9793_1.tar.bz2
    conda install --use-local torchaudio-0.6.0-py37.tar.bz2
    conda install --use-local torchvision-0.7.0-py37_cu102.tar.bz2

PyTorch1.5.1版本不需要torchaudio

    conda install --use-local pytorch-1.5.1-py3.7_cuda92_cudnn7_0.tar.bz2
    conda install --use-local cudatoolkit-9.2-0.tar.bz2
    conda install --use-local torchvision-0.6.1-py37_cu92.tar.bz2

此时,Jupyter Notebook已经“不再提示”GPU型号比较旧,PyTorch不支持了。但是Python仍然报出相同的错误提示,无法调用GPU进行运算。

    a = torch.Tensor(5,3)
    print(a)
    a.cuda()

    # tensor([[1.0194e-38, 9.6429e-39, 9.2755e-39],
    #         [9.1837e-39, 9.3674e-39, 1.0745e-38],
    #         [1.0653e-38, 9.5510e-39, 1.0561e-38],
    #         [1.0194e-38, 1.1112e-38, 1.0561e-38],
    #         [9.9184e-39, 1.0653e-38, 4.1327e-39]])

    RuntimeError: CUDA error: no kernel image is available for execution on the device
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

3.根据显卡算力选择相应的PyTorch版本

根据一些博客的讨论,错误RuntimeError: CUDA error: no kernel image is available for execution on the device可能是由于GPU的算力小于3.5。于是我们查找资料,探究各个版本的PyTorch所支持的GPU算力:

PyTorchPytonCUDAcuDNNArchitectures
pytorch-1.0.0py3.7cuda10.0.130cudnn7.4.1_1sm_30, sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.0.0py3.7cuda8.0.61cudnn7.1.2_1sm_20, sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61
pytorch-1.0.0py3.7cuda9.0.176cudnn7.4.1_1sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_70
pytorch-1.0.1py3.7cuda10.0.130cudnn7.4.2_0sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.0.1py3.7cuda10.0.130cudnn7.4.2_2sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.0.1cuda8.0.61cudnn7.1.2_0sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61
pytorch-1.0.1py3.7cuda8.0.61cudnn7.1.2_2sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61
pytorch-1.0.1py3.7cuda9.0.176cudnn7.4.2_0sm_35, sm_50, sm_60, sm_61, sm_70
pytorch-1.0.1py3.7cuda9.0.176cudnn7.4.2_2sm_35, sm_50, sm_60, sm_70
pytorch-1.1.0py3.7cuda10.0.130cudnn7.5.1_0sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.1.0py3.7cuda9.0.176cudnn7.5.1_0sm_35, sm_50, sm_60, sm_61, sm_70
pytorch-1.2.0py3.7cuda9.2.148cudnn7.6.2_0sm_35, sm_50, sm_60, sm_61, sm_70
pytorch-1.2.0py3.7cuda10.0.130cudnn7.6.2_0sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.2.0py3.7cuda9.2.148cudnn7.6.2_0sm_35, sm_50, sm_60, sm_61, sm_70
pytorch-1.3.0py3.7cuda10.0.130cudnn7.6.3_0sm_30, sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.3.0py3.7cuda10.1.243cudnn7.6.3_0sm_30, sm_35, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.3.0py3.7cuda9.2.148cudnn7.6.3_0sm_35, sm_50, sm_60, sm_61, sm_70
pytorch-1.3.1py3.7cuda10.0.130cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.3.1py3.7cuda10.1.243cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.3.1py3.7cuda9.2.148cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.4.0py3.7cuda10.0.130cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.4.0py3.7cuda10.1.243cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.4.0py3.7cuda9.2.148cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.5.0py3.7cuda10.1.243cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.5.0py3.7cuda10.2.89cudnn7.6.5_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.5.0py3.7cuda9.2.148cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.5.1py3.7cuda10.1.243cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.5.1py3.7cuda10.2.89cudnn7.6.5_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.5.1py3.7cuda9.2.148cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.6.0py3.7cuda10.1.243cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.6.0py3.7cuda10.2.89cudnn7.6.5_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.6.0py3.7cuda9.2.148cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.7.0py3.7cuda10.1.243cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.7.0py3.7cuda10.2.89cudnn7.6.5_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.7.0py3.7cuda11.0.221cudnn8.0.3_0sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80
pytorch-1.7.0py3.7cuda9.2.148cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.7.1py3.7cuda10.1.243cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.7.1py3.7cuda10.2.89cudnn7.6.5_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.7.1py3.7cuda11.0.221cudnn8.0.5_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80
pytorch-1.7.1py3.7cuda9.2.148cudnn7.6.3_0sm_37, sm_50, sm_60, sm_61, sm_70
pytorch-1.8.0py3.7cuda10.1cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.8.0py3.7cuda10.2cudnn7.6.5_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.8.0py3.7cuda11.1cudnn8.0.5_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80, sm_86
pytorch-1.8.1py3.7cuda10.1cudnn7.6.3_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.8.1py3.7cuda10.2cudnn7.6.5_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75
pytorch-1.8.1py3.7cuda11.1cudnn8.0.5_0sm_35, sm_37, sm_50, sm_60, sm_61, sm_70, sm_75, sm_80, sm_86

参考:pytorch 报错 RuntimeError: CUDA error: no kernel image is available for execution on the device

我的显卡(GeForce GT 730,2G显存),算力为3.5,应该适用于绝大多数PyTorch版本,但无法调用GPU进行运算。

因此,需要从其他途径寻找解决办法。

4.使用pip install --pre安装PyTorch

通过查找资料,我们发现有一些国际友人也遇到了和我们同样的问题,他们通过pip install --pre解决了该问题

    pip uninstall torch
    pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu110/torch_nightly.html

参考: Cuda error: no kernel image is available for execution on the device #31285

其中的cu110对应着CUDA11.0。如果自己的CUDA版本为CUDA11.x,则可以将此处替换为cu11x

这种方法“或许”可行,但是由于python服务器与中国大陆的距离较远,pip install的速度非常慢,上述命令的下载速度仅有2kb/s(好羡慕国外的网友)。或许我们得放弃这种方法了。

通过查看上述代码的python安装过程提示,可以发现pip install --pre torch torchvision是从某个网站逐个下载某个版本的torch和torchvision。因此,我们可以尝试从国内的镜像源下载并安装该版本的PyTorch安装包

	pip install C:\Users\Lenovo\Downloads\torch-1.10.1-cp37-cp37m-win_amd64.whl

结果发现安装的PyTorch是CPU版本的。无计可施,我们只能放弃这种方法。

5.Building PyTorch from source

在查阅了大量的资料后,我们发现了一种可行的方法:Building PyTorch from source.

即使GPU的型号很老,也能通过这种方法使用较新版本的PyTorch来进行GPU运算。

We will have to:

  1. adjust TORCH_CUDA_ARCH_LIST to match the desired archs
  2. choose the docker image that matches your nvidia-drivers, e.g. the one listed in the script below uses the latest of this writing docker image, and I had to update my nvidia kernel to 465 and cuda to 11.3. If your drivers are less than that use an earlier image. See comment notes for details.
  3. I map this folder on my fs ~/github/00pytorch/pytorch/docker to docker’s /tmp/out where I copy the wheels. So make sure to edit it to the real folder on your system

It will take a few hours to build if you have a relatively strong machine, but otherwise it’s very painless to do.

    ####################
    ### source build ###
    ####################

    # one time prep

    https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

    # then
    docker run --runtime=nvidia --rm nvidia/cuda:11.0-base nvidia-smi

    # find the latest container at https://ngc.nvidia.com/catalog/containers/nvidia:pytorch (use Tags tab),
    # but also check that the driver version isn't too high in release notes here:
    # https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html
    # then pull it:
    docker pull nvcr.io/nvidia/pytorch:21.04-py3

    #docker run --gpus all --ipc=host --rm -it nvcr.io/nvidia/pytorch:21.04-py3
    # to mount some host system dir inside the docker -v src:tgt
    docker run --gpus all --ipc=host --rm -it -v ~/github/00pytorch/pytorch/docker:/tmp/output nvcr.io/nvidia/pytorch:21.04-py3

    # once docker is running:
    conda create -n pytorch-dev python=3.8 -y
    bash
    conda init bash
    conda activate pytorch-dev
    conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses -y
    # adjust cuda113 below to whatever cuda version the image is for (cuda110, etc.)
    conda install -c pytorch magma-cuda113 -y

    git clone --recursive https://github.com/pytorch/pytorch
    cd pytorch
    # if you are updating an existing checkout
    #git submodule sync
    #git submodule update --init --recursive
    #git pull

    # to build a wheel
    unset PYTORCH_BUILD_VERSION
    unset PYTORCH_VERSION
    TORCH_CUDA_ARCH_LIST="6.1 8.6" \
    CUDA_HOME="/usr/local/cuda" \
    CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" \
    USE_SYSTEM_NCCL=1 \
    NCCL_INCLUDE_DIR="/usr/include/" \
    NCCL_LIB_DIR="/usr/lib/" \
    python setup.py bdist_wheel 2>&1 | tee build.log
    pip install dist/*whl
    # make a copy of the wheel outside the docker
    cp dist/*whl /tmp/output

    # adjust TORCH_CUDA_ARCH_LIST if needed, the full list is:
    # TORCH_CUDA_ARCH_LIST="5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX"

    # had to install nccl .so objects on the target system

    # could also add:
    # USE_OPENCV=1 \
    # but need to have matching .so objects on the target system

    # NEXT: build torchvision - since many packages depend on it

    cd ..
    git clone https://github.com/pytorch/vision
    cd vision
    # if you are updating an existing checkout
    #git pull

    # to build a wheel
    TORCH_CUDA_ARCH_LIST="6.1 8.6" \
    python setup.py bdist_wheel
    pip install dist/*whl
    # make a copy of the wheel outside the docker
    cp dist/*whl /tmp/output

    cd ..
    git clone --recursive https://github.com/pytorch/audio
    cd audio
    # if you are updating an existing checkout
    #git submodule sync
    #git submodule update --init --recursive
    #git pull

    # to build a wheel
    TORCH_CUDA_ARCH_LIST="6.1 8.6" \
    BUILD_SOX=1 python setup.py bdist_wheel
    pip install dist/*whl
    # make a copy of the wheel outside the docker
    cp dist/*whl /tmp/output

参考:

  1. Cuda error: no kernel image is available for execution on the device #31285
  2. Building PyTorch from source on Windows to work with an old GPU
  3. How to install pytorch FROM SOURCE (with cuda enabled for a deprecated CUDA cc 3.5 of an old gpu) using anaconda prompt on Windows 10?
  4. How to Compile the Latest Pytorch from Source in Windows with CUDA Support
  5. Building Pytorch from source with cuda support on WSL2(Ubuntu 20.04, cuda11.4, Windows11)
  6. build from source 安装 PyTorch及很多坑

这可能也会花费很多时间,我们先尝试别的方法.

6.继续降低PyTorch版本

通过查找同型号显卡(GeForce GT 730)的PyTorch安装步骤,发现有网友使用PyTorch1.0.0版本安装成功并正常使用。但是这个版本太低了,很多新功能应该无法使用。

我们只能尝试继续对PyTorch降级,最终降到1.2.0版本时,终于可以正常使用了!

    import torch

    print(torch.__version__)
    print(torch.cuda.is_available())

    # 1.2.0
    # True

    a = torch.Tensor(5,3)
    print(a)
    print(a.cuda())

    # tensor([[7.5305e+16, 6.3619e-43, 7.5305e+16],
    #         [6.3619e-43, 7.5296e+16, 6.3619e-43],
    #         [7.5296e+16, 6.3619e-43, 7.5305e+16],
    #         [6.3619e-43, 7.5305e+16, 6.3619e-43],
    #         [7.5291e+16, 6.3619e-43, 7.5291e+16]])
    # tensor([[7.5305e+16, 6.3619e-43, 7.5305e+16],
    #         [6.3619e-43, 7.5296e+16, 6.3619e-43],
    #         [7.5296e+16, 6.3619e-43, 7.5305e+16],
    #         [6.3619e-43, 7.5305e+16, 6.3619e-43],
    #         [7.5291e+16, 6.3619e-43, 7.5291e+16]], device='cuda:0')
Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐