Currently, the Cheaha cluster has 18 nodes dedicated to GPU use under the
pascalnodes partition family. Each node contains 4 individual NVIDIA P100 GPUs. These GPUs have the following specifications:
|GPU Architecture||NVIDIA Pascal|
|NVIDIA CUDA Cores||3584|
|GPU Memory||16GB CoWoS HBM2 at 732 GB/s|
|Double-Precision Performance||4.7 TeraFLOPS|
|Single-Precision Performance||9.3 TeraFLOPS|
|Compute APIs||CUDA, DirectCompute, OpenCL, OpenACC|
For more information on these nodes, see
Detailed Hardware Information.
To successfully request access to GPUs, you will need to set the partition to one of the
pascalnodes family of partitions depending on how much time you need for the job.
Additionally, when requesting a job using
sbatch, you will need to include a SLURM directive
# is the number of GPUs you need.
It is suggested that at least 2 CPUs are requested for every GPU to begin with. The user should monitor and adjust the number of cores on subsequent job submissions if necessary. Look at Managing Jobs for more information.
When requesting an interactive job through
Open OnDemand, selecting the
pascalnodes partitions will automatically request access to one GPU as well. There is currently no way to change the number of GPUs for OOD interactive jobs.
To use GPUs with our Open OnDemand MATLAB, you'll need to take a slightly different route than usual.
- Determine which CUDA Toolkits are compatible with your required version of MATLAB using the table at the MathWorks Site. The column
Pascal (cc6.x)is relevant for our system.
- Start an HPC Interactive Desktop Job with appropriate resources. Be sure to use one of the
- Open a terminal.
- Load the appropriate CUDA Toolkit Module.
- Load the appropriate MATLAB Module.
- Start MATLAB by entering the command
- When MATLAB loads, enter the command
gpuDevicein the MATLAB Command Window to verify it can identify the GPU.
For more information and official MATLAB documentation please see this page: https://www.mathworks.com/help/parallel-computing/gpu-computing-requirements.html.
You will need to load a CUDA toolkit module for relevant commands to access the GPUs. Depending on which version of tensorflow, pytorch, or other similar software you are using, a different version of the CUDA toolkit may be required. For instance, tensorflow version 2.5.0 requires CUDA toolkit version 11.2.
Several CUDA toolkit versions have been installed as modules on Cheaha. To see which CUDA toolkits are available, use:
If a specific version of the CUDA toolkit is needed but not installed, send an install request to [firstname.lastname@example.org].
To check which CUDA Toolkit module version is required for your version of Tensorflow, see the toolkit requirements chart here https://www.tensorflow.org/install/source#gpu.
PyTorch does not maintain a simple compatibility table for CUDA Toolkit versions. Instead, please manually check their "get started" page for the latest PyTorch version compatibility, and their "previous versions" page for older PyTorch version compatibility. Assume that a CUDA Toolkit version is not compatible if it is not listed for a specific PyTorch version.
To use GPUs prior to PyTorch version 1.13 you must select a
cudatoolkit version from the pytorch channel when you install PyTorch using Anaconda. It is how PyTorch knows to install a GPU compatible flavor, as opposed to the CPU only flavor. See below for templates of CPU and GPU installs for PyTorch versions prior to 1.13. Be sure to check the compatibility links above for your selected version. Note
torchaudio is also available for signal processing.
- CPU Version:
conda install pytorch==... torchvision==... -c pytorch
- GPU Version:
conda install pytorch==... torchvision==... cudatoolkit=... -c pytorch
For versions of PyTorch 1.13 and newer, use the following template instead.
- CPU Version:
conda install pytorch==... torchvision==... cpuonly -c pytorch
- GPU Version:
conda install pytorch==... torchvision==... pytorch-cuda=... -c pytorch -c nvidia
Reviewing GPU Jobs¶
As with all jobs, use
sacct to review GPU jobs. Quantity of GPUs may be reviewed using the