CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid

小能豆

CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid

I’m trying using cupy in my docker container. I use to containers which one is for CUDA and cuDNN, and the other is for cupy.

I tried this code.

import cupy as cp

cupy_array = cp.array([1, 2, 3])
cupy_result = cupy_array + 5 
print("CuPy Result:", cupy_result)

The full error log is like

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>
  File "cupy/_core/core.pyx", line 1191, in cupy._core.core.ndarray.__add__
  File "cupy/_core/core.pyx", line 1591, in cupy._core.core.ndarray.__array_ufunc__
  File "cupy/_core/_kernel.pyx", line 1292, in cupy._core._kernel.ufunc.__call__
  File "cupy/_core/_kernel.pyx", line 1319, in cupy._core._kernel.ufunc._get_ufunc_kernel
  File "cupy/_core/_kernel.pyx", line 1025, in cupy._core._kernel._get_ufunc_kernel
  File "cupy/_core/_kernel.pyx", line 72, in cupy._core._kernel._get_simple_elementwise_kernel
  File "cupy/_core/core.pyx", line 2141, in cupy._core.core.compile_with_cache
  File "/usr/local/lib/python3.8/dist-packages/cupy/cuda/compiler.py", line 492, in _compile_module_with_cache
    return _compile_with_cache_cuda(
  File "/usr/local/lib/python3.8/dist-packages/cupy/cuda/compiler.py", line 614, in _compile_with_cache_cuda
mod.load(cubin)
  File "cupy/cuda/function.pyx", line 264, in cupy.cuda.function.Module.load
  File "cupy/cuda/function.pyx", line 266, in cupy.cuda.function.Module.load
  File "cupy_backends/cuda/api/driver.pyx", line 210, in cupy_backends.cuda.api.driver.moduleLoadData
  File "cupy_backends/cuda/api/driver.pyx", line 60, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid

The result of nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4080        Off | 00000000:01:00.0  On |                  N/A |
|  0%   32C    P8               6W / 320W |    483MiB / 16376MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

The result of nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0

The result of cat /usr/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

#define CUDNN_MAJOR 8
#define CUDNN_MINOR 4
#define CUDNN_PATCHLEVEL 0
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#endif /* CUDNN_VERSION_H */

The result of pip3 freeze | grep cupy is cupy-cuda116==10.6.0

The results above are all shown in docker container for cupy.

I ran docker for CUDA and cuDNN with sudo docker run --name cuda11.6.1-cudnn8 --gpus all --runtime=nvidia -it \ --privileged --env="DISPLAY=:0:0" -v=/tmp/.X11-unix:/tmp/.X11-unix:ro \ -v=/home/youngjoo/Documents/Elevation_ws:/home/youngjoo/Documents/Elevation_ws \ -v=/dev:/dev -w=/home/youngjoo/Documents/Elevation_ws \ nvidia/cuda:11.6.1-cudnn8-devel-ubuntu20.04

My OS is Ubuntu 20.04.

Docker version is 24.0.7, build afdd53b.

How can I resolve this?

I deleted all docker containers and restarted but the result was the same.

阅读 77

2023-12-24

共1个答案

小能豆

The error you are encountering, CUDA_ERROR_INVALID_SOURCE: device kernel image is invalid, suggests that there might be an issue with the compilation of the CUDA kernel code used by CuPy. Here are a few steps you can take to resolve the issue:

Ensure Compatibility: Make sure that the version of CuPy you are using (cupy-cuda116==10.6.0) is compatible with your CUDA and cuDNN versions. It seems you are using CUDA 12.2, but your CuPy version is built for CUDA 11.6. Try upgrading CuPy to a version that matches your CUDA version. You can install the latest CuPy version by running:

pip install cupy-cuda122

Clean and Rebuild: Sometimes, issues can arise due to a corrupted build or cached files. Try cleaning and rebuilding your CuPy installation:

pip uninstall cupy pip install cupy-cuda122

This ensures a fresh installation.

Check GPU Drivers: Ensure that your GPU drivers are up-to-date. You have CUDA 12.2 installed, so make sure that your GPU drivers are compatible with CUDA 12.2.
Check CUDA Toolkit Installation: Make sure that the CUDA toolkit is correctly installed in your Docker container. You can check the version of the CUDA toolkit by running:

nvcc --version

Ensure that it matches the CUDA version you are using with CuPy.

Check Docker Configuration: Ensure that your Docker container is configured correctly to use the GPU. It seems you are using the --gpus all flag, which should work, but double-check your Docker settings.

After performing these steps, try running your CuPy code again. If the issue persists, there might be an underlying compatibility or configuration problem that requires further investigation.

2023-12-24