Using Python on a supercomputer

There are 3 ways to use Python on the OzSTAR supercomputers:

Python module and virtual environments (recommended for beginners)
conda package manager (recommended if you need a precise version of Python and packages)
In a container using Apptainer (recommended for optimising Lustre I/O)

Note

Although the operating system provides Python at /usr/bin/python, we do not recommend using it. Choosing one of the three recommended methods ensures your Python environments are managed in a sensible way.

Python modules

The modules system provides several versions of Python. These can be examined by:

module spider python

The -bare versions provide a minimal installation of Python, while the standard versions include commonly used packages. Calling module spider on a specific version, e.g.

module spider python/3.10.4

will show all of the included packages and their version numbers.

If you need to install additional packages, you may do so using the pip package manager. To do so, you must first create a virtual environment, since users do not have write permissions to the central Python installation, unlike on a personal computer. A virtual environment can be created and then activated by:

python -m venv /path/to/new/virtual/environment
source /path/to/new/virtual/environment/bin/activate

Once you are in a virtual environment, you can use pip to install packages.

Note

By default, users cannot install packages outside of a virtual environment (i.e. into their ~/.local directory). Installations into ~/.local can be enabled by setting the environment variable PIP_REQUIRE_VIRTUALENV=false. This is very likely to lead to conflicts between your environments, and we do not provide support for such usage.

Example

Here is a step-by-step example on NT. In this example, we create a virtual environment called foo, where we want to use the SciPy library. The SciPy library is provided by the python-scientific/3.10.4-foss-2022a toolchain (the inclusion of SciPy can be verified using ml spider python-scientific/3.10.4-foss-2022a). We install an additional Python package (the fictional my_extra_package).

# load python modules
ml python-scientific/3.10.4-foss-2022a

# see what modules are loaded
module list

# create a new python virtual environment
python -m venv --system-site-packages ~/foo

# activate the venv
. ~/foo/bin/activate

# you now see a (foo) prompt indicating you are in a venv

# install your extras into the venv
pip install my_extra_package

# start IPython (which is provided by the toolchain)
ipython

# you now see a In [1]: prompt indicating you are in IPython
import scipy
import my_extra_package

# do your prototyping work here. ie. write python commands.

# use ^D to exit IPython

# you may also run your short python scripts
python my_script.py

# deactivate the venv with
deactivate

When you logout from the node and login again, and want to use this venv again then you must first load all the modules above, and then just

. ~/foo/bin/activate
ipython

Note

You must load all of the required modules before activating your Python virtual environment.

Also note that we used the option --system-site-packages when creating the venv. This guarantees that any dependencies of module-loaded python packages are still accessible from inside the environment. If you are not using any module-loaded python packages on top of your venv, then it is safe to omit this option.

Conda

Conda is an open-source package manager typically used for (though not limited to) Python packages. It was originally developed by Anaconda Inc. to distribute their Python environment "Anaconda". It can be considered as a replacement for the pip package manager.

On the OzSTAR supercomputers, Conda can be used by loading the conda module.

Note

The conda module is actually an alias for Mamba a reimplementation of conda in C++. The interface is the same, so users will not notice any difference. mamba install benefits from considerably improved performance when installing packages, whereas conda install still uses the old (slower) solver.

You may have also heard of Miniconda, Miniforge, Mambaforge and Micromamba. You can find a quick summary of the differences here: "What’s the difference between Anaconda, conda, Miniconda, mamba, Mambaforge, micromamba?", but from a user perspective they can all be considered "the same".

See the Conda documentation for instructions on how to create and manage environments.

Note

The default channel is set to conda-forge. To use the channels that would normally come with conda, use

mamba install -c defaults <package name>

If you require an environment with the Anaconda distribution of packages

mamba install -c defaults anaconda

Conda and home directory quota

By default, Conda places environments in the home directory in ~/.conda. As you create new environments, the home directory disk quota will be exhausted very quickly. To resolve this issue, we recommend changing where conda environments are created:

conda config --env --prepend envs_dirs /path/to/my/project/on/fred/.conda/envs
conda config --env --prepend pkgs_dirs /path/to/my/project/on/fred/.conda/pkgs

Alternatively, you can move your .conda directory into your project storage and then create a symlink from there, so that Conda still "sees" it in the home directory:

mv ~/.conda /fred/oz000/username/.conda
ln -s /fred/oz000/username/.conda ~/.conda

Note

The backups for the home directory does not follow symlinks, so your .conda directory will no longer be backed up. To create a "backup" of the environment, you can export a YAML file specifying all the packages and versions in the environment:

conda env export > environment.yml

This YAML file can be stored in the home directory. To re-create the environment:

conda env create -f environment.yml

Using MPI libraries

The MPI libraries provided by the module system are optimised for high performance on the OzSTAR and NT hardware. Packages from conda with MPI dependencies will install MPI binaries built by conda-forge. This may run with reduced performance, or not work at all. This can be solved by installing a "dummy" MPI library on conda so that the target package links with the system's MPI library, while dependencies are still resolved correctly:

conda install "openmpi=x.y.z=external_*"

For more details, see:

Using CUDA/GPU enabled packages

Some packages that are CUDA enabled (e.g. TensorFlow) will only install the CUDA/GPU enabled version if conda detects a display driver. This is controlled by the __cuda virtual package, which corresponds to the maximum version of CUDA supported by the display driver. You can list what virtual packages are detected by conda with:

conda info

The NT login nodes tooarrana1/2, unlike the old farnarkle1/2 login nodes, do not have GPUs so there is no display driver detected. In order to install the CUDA/GPU enabled version of a package, you can either build your environment on one of the farnarkle login nodes, or you can override the virtual package manually using the CONDA_OVERRIDE_CUDA environment variable. You should set this to the maximum version of CUDA supported by the display driver, which you can determine by running nvidia-smi on a GPU node.

As of writing, the maximum version of CUDA supported by the display driver is 12.4.

$ nvidia-smi
Thu Oct 10 11:59:17 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla P100-PCIE-12GB           On  |   00000000:D8:00.0 Off |                    0 |
| N/A   39C    P0             29W /  250W |     603MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

So, to install the CUDA/GPU enabled version of e.g. TensorFlow on tooarrana1/2 you would run:

CONDA_OVERRIDE_CUDA=12.4 mamba install tensorflow

Fore more information, see:

Apptainer

Note

See the Apptainer on OzSTAR page for getting started with Apptainer.

In the context of Python environments, Apptainer has two main benefits:

Ensuring reproducibility and portability across different systems
Optimising Lustre I/O

Large, complex Python environments on Lustre can often be slow to load/create and, at worst, may even cause a loss of filesystem performance for ALL users on the cluster. You can mitigate this by containerising your Python environment. This way, the Lustre filesystem sees only a single large file (your container, which is just a read-only SquashFS), even though underneath you may be dealing with a large number of small files.

Python Apptainer Example

A simple definition file my_container.def might look like this:

BootStrap: docker
From: python:3.12.7-bookworm

%post
    pip install wheel tensorflow[and-cuda] tensorflow-datasets pandas

This uses the official Python 3.12.7 image from DockerHub as a base, into which it installs the TensorFlow library with CUDA support, and a few other packages, using pip.

To build the container image my_container.sif from the definition file, run:

apptainer build my_container.sif my_container.def

Then you can run a TensorFlow script using the Python environment within the container:

apptainer run --nv my_container.sif python my_tensorflow_script.py

Note that my_tensorflow_script.py does not exist in the container, but is assumed to be in the current directory, which is automatically mounted. We specify the --nv flag to enable GPU support in the container.

For a similar example, but instead using Micromamba in the container, see Building a containerised conda environment.

Note

Remember, the .sif container is an immutable SquashFS (i.e. read-only). Once you have built your containerised environment, you cannot modify it -- you must rebuild it to make changes.

Warning

You may be fooled into thinking that you can write to your container. For example, the following command may return without error apptainer run my_container.sif pip install xyz.

However, looking at the output carefully you will notice the following warning: "Defaulting to user installation because normal site-packages is not writeable".

In this case, the xyz package was installed into your ~/.local, and not into the container. Note that this is in your actual home directory on the host, since it is implictly bind mounted at runtime (but not at build time).

This is a trap for the unwary and will almost certainly lead to confusion and conflicts. Avoid it at all costs.