Code:SLURM

User environment
Zeroth, set up your own system (for linux), by adding a host def for circe to your $HOME/.ssh/config file. Then you don't have to keep typing in circe's full path and your username when running ssh from linux.

Host rc  User Hostname rcslurm.rc.usf.edu ServerAliveInterval 30 ServerAliveCountMax 120 ForwardX11 yes

You may need:

mkdir -p $HOME/.ssh && chmod 700 $HOME/.ssh vi $HOME/.ssh/config # ':q' to exit chmod 600 $HOME/.ssh/config

The slurm job status command is squeue. A helpful alias to monitor your own jobs is,

alias myq="squeue -u $USER"

Queue
Here's a basic template for queuing a job using MPI (here 2 whole nodes and 6h max run-time).

By default, slurm jobs start in the same directory that sbatch was invoked.

A few more useful options are:
 * 1) SBATCH -o output_log_name.log
 * 2) SBATCH --mem=2000

These will specify the name of the output log file instead of the default (names auto-generated from the job number, e.g. slurm-22425.out). By default, the log files include both standard output and standard error from the job.

Submit with

The -p saturn selects the default queue and can be left out. Other possible values select different node partitions, and are:
 * "jupiter": 444 cores; preemptable by "deadline" QOS (currently inactive).
 * "saturn": 280 cores; default; preemptable by "deadline" QOS (currently inactive).
 * "neptune": 168 cores; preemptable by "deadline" QOS (currently inactive).
 * "hii_broad": 80 cores; testing "contributor" hardware pool; preemptable by "hii_broad" QOS (active).
 * "titan": 16 cores; no preemption; 128 GB RAM for large memory jobs.
 * "pluto": 8 cores; no preemption.

For execution and status, use squeue, or the myq command, defined above as an alias in .bashrc.

For more info, see LLNL's Slurm Quickstart Guide.

Here's a run-down on some of the environment variables available during running (for scripting) see sbatch's manpage for more:
 * SLURM_JOB_NAME - Name of the job.
 * SLURM_JOB_ID - The ID of the job allocation.
 * SLURM_CPUS_ON_NODE - Number of CPUS on the allocated node.
 * SLURM_JOB_NODELIST - List of nodes allocated to the job in a compressed format.
 * SLURM_JOB_NUM_NODES - Total number of nodes in the job’s resource allocation.
 * SLURM_JOB_CPUS_PER_NODE - Count of processors available to the job on this node.
 * SLURM_SUBMIT_DIR - The directory from which sbatch was invoked.
 * SLURM_JOB_PARTITION - Name of the partition in which the job is running.
 * SLURM_LOCALID - Node local task ID for the process within a job.
 * SLURM_GTIDS - Global task  IDs  running on this node.  Zero  origin and comma separated.