GPUs¶
Available Devices¶
Currently, the Cheaha cluster has 18 nodes dedicated to GPU use under the pascalnodes
partition family. Each node contains 4 individual NVIDIA P100 GPUs. These GPUs have the following specifications:
GPU Architecture | NVIDIA Pascal |
NVIDIA CUDA Cores | 3584 |
GPU Memory | 16GB CoWoS HBM2 at 732 GB/s |
Double-Precision Performance | 4.7 TeraFLOPS |
Single-Precision Performance | 9.3 TeraFLOPS |
Compute APIs | CUDA, DirectCompute, OpenCL, OpenACC |
For more information on these nodes, see Detailed Hardware Information
.
Scheduling GPUs¶
To successfully request access to GPUs, you will need to set the partition to one of the pascalnodes
family of partitions depending on how much time you need for the job.
Partition | Time Limit |
---|---|
pascalnodes | 12 hours |
pascalnodes-medium | 50 hours |
Additionally, when requesting a job using sbatch
, you will need to include a SLURM directive --gres=gpu:#
where #
is the number of GPUs you need.
Note
It is suggested that at least 2 CPUs are requested for every GPU to begin with. The user should monitor and adjust the number of cores on subsequent job submissions if necessary. Look at Managing Jobs for more information.
Open OnDemand¶
When requesting an interactive job through Open OnDemand
, selecting the pascalnodes
partitions will automatically request access to one GPU as well. There is currently no way to change the number of GPUs for OOD interactive jobs.
MATLAB¶
To use GPUs with our Open OnDemand MATLAB, you'll need to take a slightly different route than usual.
- Determine which CUDA Toolkits are compatible with your required version of MATLAB using the table at the MathWorks Site. The column
Pascal (cc6.x)
is relevant for our system. - Start an HPC Interactive Desktop Job with appropriate resources. Be sure to use one of the
pascalnodes*
Partitions. - Open a terminal.
- Load the appropriate CUDA Toolkit Module.
- Load the appropriate MATLAB Module.
- Start MATLAB by entering the command
matlab
. - When MATLAB loads, enter the command
gpuDevice
in the MATLAB Command Window to verify it can identify the GPU.
For more information and official MATLAB documentation please see this page: https://www.mathworks.com/help/parallel-computing/gpu-computing-requirements.html.
CUDA Toolkit¶
You will need to load a CUDA toolkit module for relevant commands to access the GPUs. Depending on which version of tensorflow, pytorch, or other similar software you are using, a different version of the CUDA toolkit may be required. For instance, tensorflow version 2.5.0 requires CUDA toolkit version 11.2.
Several CUDA toolkit versions have been installed as modules on Cheaha. To see which CUDA toolkits are available, use:
If a specific version of the CUDA toolkit is needed but not installed, send an install request to [support@listserv.uab.edu].
Tensorflow Compatibility¶
To check which CUDA Toolkit module version is required for your version of Tensorflow, see the toolkit requirements chart here https://www.tensorflow.org/install/source#gpu.
PyTorch Compatibility¶
PyTorch does not maintain a simple compatibility table for CUDA Toolkit versions. Instead, please manually check their "get started" page for the latest PyTorch version compatibility, and their "previous versions" page for older PyTorch version compatibility. Assume that a CUDA Toolkit version is not compatible if it is not listed for a specific PyTorch version.
To use GPUs prior to PyTorch version 1.13 you must select a cudatoolkit
version from the pytorch channel when you install PyTorch using Anaconda. It is how PyTorch knows to install a GPU compatible flavor, as opposed to the CPU only flavor. See below for templates of CPU and GPU installs for PyTorch versions prior to 1.13. Be sure to check the compatibility links above for your selected version. Note torchaudio
is also available for signal processing.
- CPU Version:
conda install pytorch==... torchvision==... -c pytorch
- GPU Version:
conda install pytorch==... torchvision==... cudatoolkit=... -c pytorch
For versions of PyTorch 1.13 and newer, use the following template instead.
- CPU Version:
conda install pytorch==... torchvision==... cpuonly -c pytorch
- GPU Version:
conda install pytorch==... torchvision==... pytorch-cuda=... -c pytorch -c nvidia
Reviewing GPU Jobs¶
As with all jobs, use sacct
to review GPU jobs. Quantity of GPUs may be reviewed using the reqtres
and alloctres
fields.