Skip to content

Tensorflow on HPC

Introduction

This manual will help you set up a tensorflow environment on the MI/CI HPC servers with GPU support. For this we will use Miniconda to install the python-environment with the tensorflow-package.

Follow this link for an overview of the HPC servers available.

Currently this manual has only be tested on the interactive hpc servers. These servers share your home-directory. You only need to follow these instructions once for setting up the environment in your home-directory to use it on all servers.

Follow this link for information on how to access these servers.

Setting up CUDA

On those servers that have a Nvidia card (with CUDA) support you can use GPU-support in tensorflow provided the compute capability is sufficient. Currently this should be 3.5 or higher [16 March 2020]. For this check the table above or check these websites for the latest information:

Warning

The GTX 690 in hpc18 is not supported by tensorflow

Also the CUDA and cuDNN library must be supported by tensorflow to be able to use GPU-support. Currently with tensorflow version 2.1.0 or higher this should be CUDA version 10.1. Check this website for the latest information:

To use the CUDA library you can use the following command which will set the right environment variables:

$ module load cuda/10.1
$ module load cudnn

Tip

This only needs to be done once after opening a new terminal to the server or starting a new login-shell. This process can be automated by creating a .modules file in your home-directory and add the line:

`module load cuda/10.1 cudnn`

Setting up Tensorflow in Miniconda

Tensorflow is a python-package and must be installed alongside python to be used. You can see here which python version is needed:

Note

all steps below have to be done on the HPC server in your home-directory

First follow the instructions in the Miniconda manual if you do not yet have a miniconda-environment in your home-directory.

Decide if you would like to use the CPU-version of tensorflow or the GPU-version. Depending your choice follow the appropriate instructions below. It’s also possible to install both environments next to each other.

Setting up → CPU-version

Execute the following commands to setup a new python virtual environment named tf and install the tensorflow-package from the conda-forge channel:

# create new python virtual environment named tf
$ conda create -n tf
$ conda activate tf
$ conda config --add channels conda-forge
$ conda install tensorflow

Setting up → GPU-version

Execute the following commands to setup a new python virtual environment named tfgpu and install the tensorflow-gpu-package from the conda-forge channel:

# create new python virtual environment named tfgpu
$ conda create -n tfgpu
$ conda activate tfgpu
$ conda config --add channels conda-forge
$ conda install tensorflow-gpu

Usage

Everytime you login to a server you need to activate the virtual environment you just created.

Usage → CPU-version

$ conda activate tf

Usage ->> GPU-version

###
# the following line is only needed on the HPC-cluster nodes.
# Do not use this on the AWI-cluster or the stand-alone nodes!
$ module use /opt/insy/modulefiles/
###
$ module load cuda cudnn
$ conda activate tfgpu
###
# Testing tensorflow with GPU
$ python
>>> import tensorflow as tf
>>> tf.test.gpu_device_name()
<SNIP>
'/device:GPU:0'

Good luck!