Tensorflow 仅看到 CPU

小能豆

Tensorflow 仅看到 CPU

我已经尝试了几乎所有的方法，但 tensorflow 没有看到 gpu。

>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2020-10-08 16:57:31.356377: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-10-08 16:57:31.408641: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2299965000 Hz
2020-10-08 16:57:31.409379: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5d51170 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-08 16:57:31.409459: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-10-08 16:57:31.425795: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-10-08 16:57:32.550621: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-10-08 16:57:32.550734: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (dmitry-pc): /proc/driver/nvidia/version does not exist
False
>>> tf.python.client.device_lib.list_local_devices()
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 15855895153430362166
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 2500413154884527026
physical_device_desc: "device: XLA_CPU device"
]

Ubuntu 20.04

显卡-> GeForce 940mx

CUDA-> 10.1

cudnn-> 7.6

tensorflow-gpu-> 2.3.1

阅读 22

2024-12-24

共1个答案

小能豆

您遇到的错误表明，TensorFlow 没有检测到您的 GPU。错误信息 CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected 以及没有显示 GPU 设备的情况通常是由于以下几种原因导致的：

可能的问题与解决方法：

1. CUDA 和 cuDNN 驱动安装不正确或不兼容

由于您使用的是 TensorFlow 2.3.1 和 CUDA 10.1，确保 CUDA、cuDNN 以及显卡驱动之间的版本兼容性是非常重要的。

解决方法：
- 检查 CUDA 驱动是否安装正确：
bash nvidia-smi
这会显示您的 NVIDIA GPU 驱动状态。如果命令没有正确显示您的显卡信息，说明显卡驱动没有安装或未正常加载。

 - **安装或更新 NVIDIA 驱动：**
   在 Ubuntu 上，您可以使用以下命令来安装或更新 NVIDIA 驱动：
   ```bash
   sudo apt update
   sudo apt install nvidia-driver-450  # 可以根据显卡型号选择合适的版本
   sudo reboot
   ```

验证 CUDA 和 cuDNN 版本：
- CUDA 10.1 与 cuDNN 7.6： 确保已正确安装对应版本的 CUDA 和 cuDNN。您可以按照 TensorFlow 官方要求验证版本匹配性。
- 下载并安装适用于 CUDA 10.1 的 cuDNN 7.6 版本。

检查 CUDA 是否工作：
bash nvcc --version # 查看安装的 CUDA 版本

2. TensorFlow 版本与 CUDA 和 cuDNN 版本不匹配

TensorFlow 2.3.1 对应的是 CUDA 10.1 和 cuDNN 7.6，但有时候即使版本匹配，安装过程中的一些细节也可能导致不兼容。

解决方法：
- 如果你不确定安装是否成功，考虑重新安装 TensorFlow 和相关的库：
bash pip uninstall tensorflow-gpu pip install tensorflow-gpu==2.3.1

如果 CUDA 版本不匹配，也可以考虑升级 TensorFlow 或更改 CUDA 版本。例如，如果您可以更新到 CUDA 11.x，并且升级 TensorFlow 到 2.4 或更高版本。

3. 检查 GPU 是否被正确识别

解决方法：
您可以检查 TensorFlow 是否能够看到 GPU，使用以下代码：

python import tensorflow as tf print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))

这应该会返回一个显示 GPU 数量的结果。如果仍然是 0，则表明 CUDA 驱动没有正确工作。

4. CUDA 驱动可能没有被正确加载

如果您在 Ubuntu 上安装了 CUDA 但未正确加载相关的驱动，可以尝试以下命令：

bash sudo modprobe nvidia

5. Python 环境问题

由于您可能使用了 virtualenv 或 conda 环境，确保您的虚拟环境能够访问 GPU。确保 tensorflow-gpu 已正确安装并在环境中启用。

解决方法：
确保您在 Python 环境中正确安装了以下库：

bash pip install tensorflow-gpu pip install nvidia-pyindex pip install nvidia-tensorflow

6. 检查 CUDA 路径是否配置正确

确保您的 CUDA 路径被正确设置。在 ~/.bashrc 或 ~/.zshrc 中配置路径：

bash export PATH=/usr/local/cuda-10.1/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64:$LD_LIBRARY_PATH

然后运行以下命令使配置生效：

bash source ~/.bashrc

7. 其他问题排查

检查 TensorFlow GPU 配置： 使用 tf.config.list_physical_devices('GPU') 可以确认 TensorFlow 是否检测到 GPU：

python import tensorflow as tf physical_devices = tf.config.list_physical_devices('GPU') if physical_devices: print(f"Found {len(physical_devices)} GPU(s)") else: print("No GPU detected")

总结：

确保 NVIDIA 驱动和 CUDA 配置正确。
确保 TensorFlow 与 CUDA 和 cuDNN 版本兼容。
确保 TensorFlow 能够正确识别和使用 GPU。

如果执行了上述步骤后问题依旧存在，可能需要检查 TensorFlow 安装是否完全正确，或者尝试重装相关软件包。

2024-12-24