Using Python on a supercomputer
There are 2 ways to use Python on the OzSTAR supercomputers:
Python module and virtual environments (recommended for beginners)
conda package manager (recommended if you need a precise version of Python and packages)
Note
You may also use Apptainer to manage your Python environment as part of a more complex workflow. This approach, however, is not recommended for the typical user.
Note
Although the operating system provides Python at /usr/bin/python
, we do not recommend using it. Choosing one of the two recommended methods ensures your Python environments are managed in a sensible way.
Python modules
The modules system provides several versions of Python. These can be examined by:
module spider python
The -bare
versions provide a minimal installation of Python, while the standard versions include commonly used packages. Calling module spider
on a specific version, e.g.
module spider python/3.10.4
will show all of the included packages and their version numbers.
If you need to install additional packages, you may do so using the pip
package manager. To do so, you must first create a virtual environment, since users do not have write permissions to the central Python installation, unlike on a personal computer. A virtual environment can be created and then activated by:
python -m venv /path/to/new/virtual/environment
source /path/to/new/virtual/environment/bin/activate
Once you are in a virtual environment, you can use pip
to install packages.
Note
By default, users cannot install packages outside of a virtual environment (i.e. into their ~/.local
directory). Installations into ~/.local
can be enabled by setting the environment variable PIP_REQUIRE_VIRTUALENV=false
. This is very likely to lead to conflicts between your environments, and we do not provide support for such usage.
Example
Here is a step-by-step example on NT. In this example, we create a virtual environment called foo
, where we want to use the SciPy library. The SciPy library is provided by the python-scientific/3.10.4-foss-2022a
toolchain (the inclusion of SciPy can be verified using ml spider python-scientific/3.10.4-foss-2022a
). We install an additional Python package (the fictional my_extra_package
).
# load python modules
ml python-scientific/3.10.4-foss-2022a
# see what modules are loaded
module list
# create a new python virtual environment
python -m venv --system-site-packages ~/foo
# activate the venv
. ~/foo/bin/activate
# you now see a (foo) prompt indicating you are in a venv
# install your extras into the venv
pip install my_extra_package
# start IPython (which is provided by the toolchain)
ipython
# you now see a In [1]: prompt indicating you are in IPython
import scipy
import my_extra_package
# do your prototyping work here. ie. write python commands.
# use ^D to exit IPython
# you may also run your short python scripts
python my_script.py
# deactivate the venv with
deactivate
When you logout from the node and login again, and want to use this venv again then you must first load all the modules above, and then just
. ~/foo/bin/activate
ipython
Note
You must load all of the required modules before activating your Python virtual environment.
Also note that we used the option --system-site-packages
when creating the venv. This guarantees that any dependencies of module-loaded python packages are still accessible from inside the environment.
If you are not using any module-loaded python packages on top of your venv, then it is safe to omit this option.
Conda
Conda is an open-source package manager typically used for (though not limited to) Python packages. It was originally developed by Anaconda Inc. to distribute their Python environment "Anaconda". It can be considered as a replacement for the pip package manager.
On the OzSTAR supercomputers, Conda can be used by loading the conda
module.
Note
The conda module is actually an alias for Mamba a reimplementation of conda in C++. The interface is the same, so users will not notice any difference. mamba install
benefits from considerably improved performance when installing packages, whereas conda install
still uses the old (slower) solver.
You may be familiar with the Anaconda distribution of Python, which contains a specific version of Python bundled with a large set of datascience packages. In contrast, the Conda module provides only the package manager, giving you the freedom to create your own environment with the exact versions of Python and packages that you need.
See the Conda documentation for instructions on how to create and manage environments.
Note
The default channel is set to conda-forge
. To use the channels that would normally come with conda, use
mamba install -c defaults <package name>
If you require an environment with the Anaconda distribution of packages (https://docs.anaconda.com/free/anaconda/)
mamba install -c defaults anaconda
Conda and home directory quota
By default, Conda places environments in the home directory in ~/.conda
. As you create new environments, the home directory disk quota will be exhausted very quickly. To resolve this issue, we recommend changing where conda environments are created:
conda config --env --prepend envs_dirs /path/to/my/project/on/fred/.conda/envs
conda config --env --prepend pkgs_dirs /path/to/my/project/on/fred/.conda/pkgs
Alternatively, you can move your .conda
directory into your project storage and then create a symlink from there, so that Conda still "sees" it in the home directory:
mv ~/.conda /fred/oz000/username/.conda
ln -s /fred/oz000/username/.conda ~/.conda
Note
The backups for the home directory does not follow symlinks, so your .conda
directory will no longer be backed up. To create a "backup" of the environment, you can export a YAML file specifying all the packages and versions in the environment:
conda env export > environment.yml
This YAML file can be stored in the home directory. To re-create the environment:
conda env create -f environment.yml
Using MPI libraries
The MPI libraries provided by the module system are optimised for high performance on the OzSTAR and NT hardware. Packages from conda with MPI dependencies will install MPI binaries built by conda-forge. This may run with reduced performance, or not work at all. This can be solved by installing a "dummy" MPI library on conda so that the target package links with the system's MPI library, while dependencies are still resolved correctly:
conda install "openmpi=x.y.z=external_*"
For more details, see: https://conda-forge.org/docs/user/tipsandtricks.html#using-external-message-passing-interface-mpi-libraries