Tensorflow on HPC¶
Introduction¶
This manual will help you set up a tensorflow environment on the MI/CI HPC servers with GPU support. For this we will use Miniconda to install the python-environment with the tensorflow-package.
Follow this link for an overview of the HPC servers available.
Currently this manual has only be tested on the interactive hpc
servers. These servers share your home-directory. You only need to follow these instructions once for setting up the environment in your home-directory to use it on all servers.
Follow this link for information on how to access these servers.
Setting up CUDA¶
On those servers that have a Nvidia card (with CUDA) support you can use GPU-support in tensorflow provided the compute capability is sufficient. Currently this should be 3.5 or higher [16 March 2020]. For this check the table above or check these websites for the latest information:
- https://developer.nvidia.com/cuda-gpus#compute and
- https://www.tensorflow.org/install/gpu#hardware_requirements
Warning
The GTX 690 in hpc18
is not supported by tensorflow
Also the CUDA and cuDNN library must be supported by tensorflow to be able to use GPU-support. Currently with tensorflow version 2.1.0 or higher this should be CUDA version 10.1. Check this website for the latest information:
To use the CUDA library you can use the following command which will set the right environment variables:
$ module load cuda/10.1
$ module load cudnn
Tip
This only needs to be done once after opening a new terminal to the server or starting a new login-shell. This process can be automated by creating a .modules
file in your home-directory and add the line:
`module load cuda/10.1 cudnn`
Setting up Tensorflow in Miniconda¶
Tensorflow is a python-package and must be installed alongside python to be used. You can see here which python version is needed:
Note
all steps below have to be done on the HPC server in your home-directory
First follow the instructions in the Miniconda manual if you do not yet have a miniconda-environment in your home-directory.
Decide if you would like to use the CPU-version of tensorflow
or the GPU-version. Depending your choice follow the appropriate instructions below. It’s also possible to install both environments next to each other.
Setting up → CPU-version¶
Execute the following commands to setup a new python virtual environment named tf
and install the tensorflow
-package from the conda-forge
channel:
# create new python virtual environment named tf
$ conda create -n tf
$ conda activate tf
$ conda config --add channels conda-forge
$ conda install tensorflow
Setting up → GPU-version¶
Execute the following commands to setup a new python virtual environment named tfgpu
and install the tensorflow-gpu
-package from the conda-forge
channel:
# create new python virtual environment named tfgpu
$ conda create -n tfgpu
$ conda activate tfgpu
$ conda config --add channels conda-forge
$ conda install tensorflow-gpu
Usage¶
Everytime you login to a server you need to activate the virtual environment you just created.
Usage → CPU-version¶
$ conda activate tf
Usage ->> GPU-version¶
###
# the following line is only needed on the HPC-cluster nodes.
# Do not use this on the AWI-cluster or the stand-alone nodes!
$ module use /opt/insy/modulefiles/
###
$ module load cuda cudnn
$ conda activate tfgpu
###
# Testing tensorflow with GPU
$ python
>>> import tensorflow as tf
>>> tf.test.gpu_device_name()
<SNIP>
'/device:GPU:0'
Good luck!