Difference between revisions of "Best Practices"
(Created page with "This page collects best use practices and expected timings for codes we use regularly. If you have additional timing info., please share the wealth! == Gromacs == On CIRCE, US…") |
m (→Compilation Specifics) |
||
Line 35: | Line 35: | ||
We compiled Gromacs 5.0.5 for the Intel(R) Xeon(R) CPU E5-2650 and GPU acceleration using the following script: |
We compiled Gromacs 5.0.5 for the Intel(R) Xeon(R) CPU E5-2650 and GPU acceleration using the following script: |
||
− | < |
+ | <source lang="bash"> |
module load compilers/intel/14.0.1 mpi/openmpi/1.6.1 apps/cmake/2.8.12.2 apps/cuda/6.5.14 |
module load compilers/intel/14.0.1 mpi/openmpi/1.6.1 apps/cmake/2.8.12.2 apps/cuda/6.5.14 |
||
Line 49: | Line 49: | ||
make -j8 |
make -j8 |
||
make install |
make install |
||
− | </ |
+ | </source> |
AVX2_256 fails to run on this machine, terminating with an error '''Program received signal 4, Illegal instruction.''' This happens generally when your program tries to do something the CPU running it doesn't understand. Here, the program compiled for AVX2_256 had an AVX2 instruction, which the older processor choked on. You'll have to watch out for issues like this on CIRCE, which contains a mix of old and new Intel and AMD processors. |
AVX2_256 fails to run on this machine, terminating with an error '''Program received signal 4, Illegal instruction.''' This happens generally when your program tries to do something the CPU running it doesn't understand. Here, the program compiled for AVX2_256 had an AVX2 instruction, which the older processor choked on. You'll have to watch out for issues like this on CIRCE, which contains a mix of old and new Intel and AMD processors. |
Revision as of 10:07, 18 June 2015
This page collects best use practices and expected timings for codes we use regularly. If you have additional timing info., please share the wealth!
Gromacs
On CIRCE, USF's Linux x86_86 SLURM cluster, we have several GPU systems available. Using them requires requesting a special request flag to SLURM.
Here's a submit script:
#!/bin/bash #SBATCH -J run #SBATCH -o run_job.log #SBATCH --nodes=1 -p cuda --exclusive --cpus-per-task=16 -t 24:00:00 --gres=gpu:2 --constraint gpu_K20 --constraint avx
The -p option is not required, but selects a special partition that gives higher priority to GPU-using jobs. The --gres option requests nodes with 2 gpus, and the gpu_K20 option requests Kepler K20 GPUs (based on the GK110 chipset) with CUDA compute capability 3.5. Note that although this card has lots of double-precision floating point units, Gromacs won't use them, since it prefers single precision for memory throughput anyway.
Also, note that this command is specialized, since our dual-GPU nodes have exactly 16 cores. --exclusive requests the whole machine and should be the default, but is not for some reason.
The correct launch command is almost as complicated.
mpirun -bysocket -bind-to-socket -report-bindings --npernode 2 mdrun_mpi -ntomp 8 -deffnm run
The first 4 options are all sent to mpirun, asking for 2 processes to be started per node, and each process to be bound to a single socket. Each socket is a physical processor, containing 8 cores. Without the binding options, mpirun sets up Gromacs (mdrun_mpi) to run on a single core, and will only use 2 cores out of the total 16!
To continue a run that was terminated before finishing, use
mpirun -bysocket -bind-to-socket --npernode 2 mdrun_mpi -ntomp 8 -deffnm run -cpi run -append
-append is required because our NFS filesystem doesn't support locking, and you have to override the default.
Compilation Specifics
We compiled Gromacs 5.0.5 for the Intel(R) Xeon(R) CPU E5-2650 and GPU acceleration using the following script:
<source lang="bash"> module load compilers/intel/14.0.1 mpi/openmpi/1.6.1 apps/cmake/2.8.12.2 apps/cuda/6.5.14
cmake -DCMAKE_INSTALL_PREFIX=/shares/rogers \
-DFFTW_INCLUDE_DIR=/shares/rogers/include \ -DFFTW_LIBRARY=/shares/rogers/lib \ -DGMX_SIMD=AVX_256 \ -DGMX_FFT_LIBRARY=fftw3 \ -DGMX_GPU=on \ -DGMX_MPI=ON \ ..
make -j8 make install </source>
AVX2_256 fails to run on this machine, terminating with an error Program received signal 4, Illegal instruction. This happens generally when your program tries to do something the CPU running it doesn't understand. Here, the program compiled for AVX2_256 had an AVX2 instruction, which the older processor choked on. You'll have to watch out for issues like this on CIRCE, which contains a mix of old and new Intel and AMD processors.
Also, note that this version of Gromacs, compiled with CUDA support, is flexible. It can run efficiently on machines with or without GPU accelerators.