stderr: nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unk
文章目录1. 错误原因2. 解决问题1. 错误原因使用docker-compose启动镜像的时候,报错:ERROR: for huangs_devel_envCannot start service huangs_devel_env:failed to create shim:OCI runtime create failed:container_linux.go:380: starting co
·
1. 报错信息
使用docker-compose
启动镜像的时候,报错:
ERROR: for huangs_devel_env
Cannot start service huangs_devel_env:
failed to create shim:
OCI runtime create failed:
container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook
#1:: error running hook: exit status 1,
stdout: ,
stderr: nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown
ERROR: Encountered errors while bringing up the project.
错误信息很明显,就是NVIDIA
的一个问题。
2. 错误原因
根据Github-NVIDIA/nvidia-docker:Unable to start an interactive container with nvidia-docker due to driver/library version mismatch #1451
- 其中提到,这应该不是镜像的原因
- 因为在服务器中使用
nvidia-smi
,也会报类似的错误。user@CNBM:$ nvidia-smi Failed to initialize NVML:Driver/library version-mismatch
- 输入
nvidia-container-cli -k -d /dev/tty info
查看错误详情user@C4P:/$ nvidia-container-cli -k -d /dev/tty info -- WARNING, the following logs are for debugging purposes only -- I0422 15:02:53.662348 3372214 nvc.c:376] initializing library context (version=1.9.0, build=5e135c17d6dbae861ec343e9a8d3a0d2af758a4f) I0422 15:02:53.662441 3372214 nvc.c:350] using root / I0422 15:02:53.662454 3372214 nvc.c:351] using ldcache /etc/ld.so.cache I0422 15:02:53.662470 3372214 nvc.c:352] using unprivileged user 1000:1000 I0422 15:02:53.662556 3372214 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0422 15:02:53.662941 3372214 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment W0422 15:02:53.754123 3372215 nvc.c:273] failed to set inheritable capabilities W0422 15:02:53.754230 3372215 nvc.c:274] skipping kernel modules load due to failure I0422 15:02:53.754782 3372216 rpc.c:71] starting driver rpc service I0422 15:02:53.770292 3372214 rpc.c:135] driver rpc service terminated with signal 15 nvidia-container-cli: initialization error: nvml error: driver/library version mismatch I0422 15:02:53.770340 3372214 nvc.c:430] shutting down library context
- 因此,错误是很一致的,就是GPU驱动版本不匹配。
参考:
- CSDN问答:请问nvidia-docker run失败的原因? 直接docker run就成功
- Github-NVIDIA/nvidia-docker:Unable to start an interactive container with nvidia-docker due to driver/library version mismatch #1451
- CSDN博客:GPU服务器docker启动失败问题解决
3. 解决错误
似乎是因为nvidia驱动会自动更新,而且无法关闭这个自动更新,所以如果不重启的话,就会出现版本不匹配的问题。。。
重启服务器
sudo reboot now
参考:
开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!
更多推荐
已为社区贡献39条内容
所有评论(0)