Failed CUDA Toolkit Install? Ubuntu 18.04 stuck on boot of Gnome Display Manager?

I have been attempting to get TensorFlow GPU running on Ubuntu 18.04.

The system requirements are simple: TensorFlow GPU NVIDIA requirements. But I discovered the CUDA toolkit can result in some messy installation dependencies on specific versions of the Nvidia drivers leaving you with failed package installations such as:
Errors were encountered while processing:
/tmp/apt-dpkg-install-5dMwo8/100-nvidia-390_390.30-0ubuntu1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

What's worse if you reboot your computer you are met with a computer that is stuck booting, the system not able to progress past loading the Gnome Display Manager.

Is my computer completely messed up?


If you are stuck at boot up sequence access a tty terminal by pressing ctrl + alt + F2. You will be prompted to login in. Do it.

Next, type:
sudo apt-get install -f
The system will list dependencies that were not installed correctly. In my case, my installed Nvidia 390.46 drivers were not the Nvidia 390.30 that CUDA tool-kit 9.0 wanted.

To rollback the install you will need to iteratively build a list of all the installed CUDA 9.0 packages. To do this type:
sudo apt-get remove --dry-run cuda-9-0
The --dry-run provides details of why the removal failed and what other packages dependencies need to be removed all at the same time for CUDA tool-kit 9.0 to be removed.

Do this several times and append each new dependant package to the growing list. The result will be a command like this which will successfully remove all packages at once:

CUDA tool-kit 9.0

sudo apt-get remove --dry-run cuda-9-0 libcuda1-390 nvidia-390-dev nvidia-opencl-icd-390 cuda-drivers cuda-runtime-9-0

CUDA tool-kit 9.1

sudo apt-get remove cuda-9-1 cuda-drivers libcuda1-390 nvidia-390-dev nvidia-opencl-icd-390 cuda-runtime-9-1 cuda-demo-suite-9-1

CUDA tool-kit 9.2

sudo apt-get remove --dry-run nvidia-opencl-icd-396 cuda-drivers libcuda1-396 cuda-runtime-9-2 cuda-demo-suite-9-2 cuda-9-2

Almost Done...

Perform the following commands to clean up the rest of the dependencies:
sudo apt autoclean
sudo apt autoremove

Perform this command to verify there are no unmet dependencies:
sudo apt-get install -f

The output should be:
Reading package lists...
Done Building dependency tree Reading state information... Done
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.

Reboot your computer and you are up and running again. Now you must make a decision, follow these exact steps to get it installed correctly:
https://www.kinmanlam.com/2018/06/install-cuda-90-toolkit-on-ubuntu-1804.html

Or what I recommend doing is using Docker containers instead. See:
https://www.kinmanlam.com/2018/06/ubuntu-nvidia-docker-tensorflow-gpu.html

Comments

  1. Your notes are a big help to a newbie like myself. Thanks!
    My situation is slightly different - not there yet with TensorFlow at your level but Nvidia seems to do something that is out of my league (since I have GTX250). I can recover by using lightdm but not gdm3. Any suggestions?
    Regards.

    ReplyDelete
  2. Excellent write up. It's unbelievable how completely messy the whole {Nvidia+Ubuntu} situation has always been.

    You would think the Nvidia guys would realize their biggest potential fans are the power users looking to hack things and build things using a command line and that they would meet us halfway by keeping the bugs under control... but I guess we can just spin the issue as just being a right of passage into CUDAland, idk. lol

    Keep up the great work.

    ReplyDelete

Post a Comment

Popular posts from this blog

How Salesforce uses AWS to Improve The Support Call Experience

Apple Pay, Android Pay, contactless credit cards, is it safe?