Failed CUDA Toolkit Install? Ubuntu 18.04 stuck on boot of Gnome Display Manager?
I have been attempting to get TensorFlow GPU running on Ubuntu 18.04.
The system requirements are simple: TensorFlow GPU NVIDIA requirements. But I discovered the CUDA toolkit can result in some messy installation dependencies on specific versions of the Nvidia drivers leaving you with failed package installations such as:
Errors were encountered while processing:
/tmp/apt-dpkg-install-5dMwo8/100-nvidia-390_390.30-0ubuntu1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
What's worse if you reboot your computer you are met with a computer that is stuck booting, the system not able to progress past loading the Gnome Display Manager.
Next, type:
sudo apt-get install -f
The system will list dependencies that were not installed correctly. In my case, my installed Nvidia 390.46 drivers were not the Nvidia 390.30 that CUDA tool-kit 9.0 wanted.
To rollback the install you will need to iteratively build a list of all the installed CUDA 9.0 packages. To do this type:
sudo apt-get remove --dry-run cuda-9-0
The
Do this several times and append each new dependant package to the growing list. The result will be a command like this which will successfully remove all packages at once:
sudo apt-get remove --dry-run cuda-9-0 libcuda1-390 nvidia-390-dev nvidia-opencl-icd-390 cuda-drivers cuda-runtime-9-0
sudo apt-get remove cuda-9-1 cuda-drivers libcuda1-390 nvidia-390-dev nvidia-opencl-icd-390 cuda-runtime-9-1 cuda-demo-suite-9-1
The system requirements are simple: TensorFlow GPU NVIDIA requirements. But I discovered the CUDA toolkit can result in some messy installation dependencies on specific versions of the Nvidia drivers leaving you with failed package installations such as:
Errors were encountered while processing:
E: Sub-process /usr/bin/dpkg returned an error code (1)
What's worse if you reboot your computer you are met with a computer that is stuck booting, the system not able to progress past loading the Gnome Display Manager.
Is my computer completely messed up?
If you are stuck at boot up sequence access a tty terminal by pressing ctrl + alt + F2. You will be prompted to login in. Do it.
Next, type:
The system will list dependencies that were not installed correctly. In my case, my installed Nvidia 390.46 drivers were not the Nvidia 390.30 that CUDA tool-kit 9.0 wanted.
To rollback the install you will need to iteratively build a list of all the installed CUDA 9.0 packages. To do this type:
The
--dry-run
provides details of why the removal failed and what other packages dependencies need to be removed all at the same time for CUDA tool-kit 9.0 to be removed.Do this several times and append each new dependant package to the growing list. The result will be a command like this which will successfully remove all packages at once:
CUDA tool-kit 9.0
CUDA tool-kit 9.1
CUDA tool-kit 9.2
Almost Done...
Perform the following commands to clean up the rest of the dependencies:
sudo apt autoclean
sudo apt autoclean
Perform this command to verify there are no unmet dependencies:
The output should be:
Reading package lists...
0 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
Reboot your computer and you are up and running again. Now you must make a decision, follow these exact steps to get it installed correctly:
https://www.kinmanlam.com/2018/06/install-cuda-90-toolkit-on-ubuntu-1804.html
Or what I recommend doing is using Docker containers instead. See:
https://www.kinmanlam.com/2018/06/ubuntu-nvidia-docker-tensorflow-gpu.html
Your notes are a big help to a newbie like myself. Thanks!
ReplyDeleteMy situation is slightly different - not there yet with TensorFlow at your level but Nvidia seems to do something that is out of my league (since I have GTX250). I can recover by using lightdm but not gdm3. Any suggestions?
Regards.
Excellent write up. It's unbelievable how completely messy the whole {Nvidia+Ubuntu} situation has always been.
ReplyDeleteYou would think the Nvidia guys would realize their biggest potential fans are the power users looking to hack things and build things using a command line and that they would meet us halfway by keeping the bugs under control... but I guess we can just spin the issue as just being a right of passage into CUDAland, idk. lol
Keep up the great work.