Best Practices

From Predictive Chemistry
Revision as of 09:06, 18 June 2015 by David M. Rogers (talk | contribs) (Created page with "This page collects best use practices and expected timings for codes we use regularly. If you have additional timing info., please share the wealth! == Gromacs == On CIRCE, US…")

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This page collects best use practices and expected timings for codes we use regularly. If you have additional timing info., please share the wealth!

Gromacs

On CIRCE, USF's Linux x86_86 SLURM cluster, we have several GPU systems available. Using them requires requesting a special request flag to SLURM.

Here's a submit script:

#!/bin/bash
#SBATCH -J run
#SBATCH -o run_job.log
#SBATCH --nodes=1 -p cuda --exclusive --cpus-per-task=16 -t 24:00:00 --gres=gpu:2 --constraint gpu_K20 --constraint avx


The -p option is not required, but selects a special partition that gives higher priority to GPU-using jobs. The --gres option requests nodes with 2 gpus, and the gpu_K20 option requests Kepler K20 GPUs (based on the GK110 chipset) with CUDA compute capability 3.5. Note that although this card has lots of double-precision floating point units, Gromacs won't use them, since it prefers single precision for memory throughput anyway.

Also, note that this command is specialized, since our dual-GPU nodes have exactly 16 cores. --exclusive requests the whole machine and should be the default, but is not for some reason.

The correct launch command is almost as complicated.

mpirun -bysocket -bind-to-socket -report-bindings --npernode 2 mdrun_mpi -ntomp 8 -deffnm run


The first 4 options are all sent to mpirun, asking for 2 processes to be started per node, and each process to be bound to a single socket. Each socket is a physical processor, containing 8 cores. Without the binding options, mpirun sets up Gromacs (mdrun_mpi) to run on a single core, and will only use 2 cores out of the total 16!

To continue a run that was terminated before finishing, use

mpirun -bysocket -bind-to-socket --npernode 2 mdrun_mpi -ntomp 8 -deffnm run -cpi run -append

-append is required because our NFS filesystem doesn't support locking, and you have to override the default.

Compilation Specifics

We compiled Gromacs 5.0.5 for the Intel(R) Xeon(R) CPU E5-2650 and GPU acceleration using the following script:

module load compilers/intel/14.0.1 mpi/openmpi/1.6.1 apps/cmake/2.8.12.2 apps/cuda/6.5.14

cmake -DCMAKE_INSTALL_PREFIX=/shares/rogers \

       -DFFTW_INCLUDE_DIR=/shares/rogers/include \
       -DFFTW_LIBRARY=/shares/rogers/lib \
       -DGMX_SIMD=AVX_256 \
       -DGMX_FFT_LIBRARY=fftw3 \
       -DGMX_GPU=on \
       -DGMX_MPI=ON \
       ..

make -j8 make install

AVX2_256 fails to run on this machine, terminating with an error Program received signal 4, Illegal instruction. This happens generally when your program tries to do something the CPU running it doesn't understand. Here, the program compiled for AVX2_256 had an AVX2 instruction, which the older processor choked on. You'll have to watch out for issues like this on CIRCE, which contains a mix of old and new Intel and AMD processors.

Also, note that this version of Gromacs, compiled with CUDA support, is flexible. It can run efficiently on machines with or without GPU accelerators.