Leonardo uses SLURM to schedule jobs across its GPU nodes. You submit a job, as a sbatch script (fire & forget) or an salloc interactive allocation, and SLURM queues it until resources are free. The conventions below are the GLADIA defaults: they won't win awards, but they keep jobs cheap and scripts reproducible.
Skip straight to the template generator below if you just need a .slurm file.
uv for fast, reproducible venvs. Stay on CUDA ≤ 12.6 to match the node modules.ddp, bf16-mixed, and sync_batchnorm: true. Set num_nodes = ceil(#gpus / 4).srun python main.py, even inside sbatch, so SLURM maps one task per GPU correctly.exit. Check squeue --me before closing your laptop.Every SLURM script should begin by loading Leonardo's module stack. Two combinations cover almost every use case, pick the one that matches the kind of code you're running.
Generic CUDA stack. Plain CUDA toolkit plus the NVHPC compilers. Use this when you build your own environment from scratch (for example, a uv venv with your own wheels).
Cineca-AI stack. Cineca's curated AI image, with torch, transformers, and, critically, a pre-built flash-attn already in place. This saves you the painful flash-attn compile and is the fastest path to a working baseline.
Consider that each storage type induces different overheads on each job that interacts with it!. This really impacts wasted compute over large, with short jobs sweeps.
| Path | Scope | Quota | Use for |
|---|---|---|---|
$HOME | Permanent · personal | 50 GB | Configs, small code, dotfiles |
$WORK | Permanent · per project | 1 TB | Shared project data & code |
$FAST | Permanent · per project | 1 TB | Hot data — fastest IO tier |
$SCRATCH | Temporary (40 d) · personal | — | Checkpoints, intermediate outputs |
$TMPDIR | Temporary · per job | — | Job-local scratch (wiped at end) |
$PUBLIC | Permanent · personal · world-readable | 50 GB | Sharing with other users |
Cineca bills the most-used resource on each node, proportionally to the whole node. Asking for 1 GPU without matching CPU/RAM is billed the same as claiming the whole node, so there's no reason to under-allocate.
A Leonardo Booster node has 4 GPUs · 32 CPUs · 512 GB RAM. The sweet spot per GPU is therefore 8 CPUs and 128 GB RAM, that's the ratio the generator below enforces.