Chapter 4
Running Large Calculations

Phonon calculations even on small crystalline systems typically require many times the CPU resources of a ground state calculation. DFPT calculations of phonon dispersion compute dynamical matrices at a number of phonon wavevectors, each of which contains calculations of several perturbations. Each perturbation will typically require a large k-point set due to symmetry breaking by the perturbation. If the supercell method is used, converged calculations require a system of a typical size of a few hundred atoms, and many perturbations, although the k-point set used is smaller. Consequently, calculations on systems of scientific interest frequently require departmental, university or national-level supercomputing facilities, usually parallel cluster class machines.

Much of the advice for effective use of cluster or supercomputer class resources is the same as for ground-state or other types of CASTEP calculations, but there are a few special considerations for phonon calculations, set out below. Among the particularly relevant general items are the choice of memory/speed tradeoff; usually the best approach is to select the highest speed option OPT_STRATEGY_BIAS : 3 which retains wavefunction coefficients in memory rather than paging to disk. Only if the memory requirement exceeds the installed memory per node should this be lowered to zero or -3, which will page wavefunctions to disk. In that case it is vital to ensure that the temporary scratch files are written to high-speed disk (either local or a high-performance filesystem). This is usually controlled by setting the environment variable TMPDIR to point to an appropriate filespace, but the exact behaviour is compiler-dependent.

Another consideration may be that the memory requirement of the wavefunctions may be larger than can be accommodated on a small number of nodes. In that case the processor count requested should be increased to distribute the wavefunction arrays across a larger set of processors, reducing the memory/processor requirement.

4.1 Parallel execution

CASTEP implements a parallel strategy based on a hierarchical distribution of wavefunction data by k-point and plane-wave (and in future releases, band) across processors. In a phonon calculation this is used to speed up the execution within each perturbation and q-point which are still executed serially in sequence. Normally an efficient distribution is chosen automatically providing that the data_distribution parameter is not changed from the default value mixed.

To best exploit the k-point component of the parallel distribution, the total number of the processors requested should be a multiple of the number of electronic k-points or have a large common divisor. The parallel distribution is printed to the .castep file, where in this example four k-points are used:

Calculation parallelised over   32 nodes.  
K-points are distributed over    4 groups, each containing    8 nodes.

For most types of CASTEP calculations it is sufficient to choose a processor count which is a multiple of Nk, and that the degree of plane-wave-parallelism is not so large that efficiency is lost.

However the choice of processor number in a phonon calculation is severely complicated by the fact that the number of electronic k-points in the irreducible Brillouin Zone changes during the run as the perturbations and phonon q-points have different symmetries. It is not convenient to compute the number for any perturbation individually without a detailed analysis, and some compromise between all of the perturbations should be chosen. To assist in this choice a utility program, phonon_kpoints is provided. This reads the configuration of the proposed calculation from the .cell file and is simply invoked by

phonon_kpoints seedname

It then determines and prints the k-point counts, and provides a “figure of merit” for a range of possible processor counts. On most parallel architectures the efficiency of the plane-wave parallelism becomes unacceptable if there are fewer than around 200 plane-waves per node. It is usually possible to choose a processor count which allows a highly parallel run while keeping the number of plane-waves per node considerably higher than this.

4.2 Checkpointing and Restarting

Even with a parallel computer, it is frequently the case that a calculation can not be completed in a single run. Many machines have a maximum time limit on a batch queue which may be too short. On desktop machines, run time may be limited by reliability and uptime limitations. CASTEP is capable of periodically writing “checkpoint” files containing a complete record of the state of the calculation and of restarting and completing a calculation from such a checkpoint file. In particular dynamical matrices from complete q-points, and partial dynamical matrices from each perturbation are saved and can be used in a restart calculation. To enable the writing of periodic checkpoint files, set the parameter

 backup_interval 3600

which will write a checkpoint file named seedname.check every hour (the time is specified in seconds) or on completion of the next perturbation thereafter. To restart a calculation, set the parameter

 continuation : default

in the .param file before resubmitting the job. This will attempt to read seedname.check and restart the calculation from there. Alternatively the full filename of a checkpoint file may be given as argument to the continuation keyword to read an explicitly named file.

At the end of the calculation a checkpoint file seedname.check is always written. As with the intermediate checkpoint files this contains a (now complete) record of the dynamical matrices or force constant matrix resulting from phonon calculation. This may be analysed in a post-processing phase using the phonons utility.