It is most straightforward to implement a code to perform these
calculations on serial or vector processors. These offer good
performance, for example the Hitachi S3600 has a theoretical peak
performance of 2 GFLOPS (billion floating point operations per
second). However, although the approximations introduced above greatly
reduce the cost of *ab initio* electronic structure calculations,
this performance is not sufficient for the largest calculations. The
only way in which significantly higher performance can be achieved is
through the use of parallel architectures. This approach uses many
high performance microprocessors connected by a high performance
network. While this can offer much higher performance, for example 64
processors of an Hitachi SR2201 has a theoretical peak of 19.2 GFLOPS,
the techniques necessary to write efficient code are much more
complex. The data must be partitioned between the processors and
communication must occur if data resident on one processor is needed
by another. The grid on which the charge densities are represented and
the reciprocal space sphere of plane waves are distributed as columns
between processors. Operations may then be performed by each processor
on the portion of the data held locally. In particular, one-dimensional
fast Fourier transforms may be performed along these columns, although
the data must be reordered to perform the Fourier transforms along the
other data axes. This process is the dominant communication cost for
this implementation.

In the foreseeable future, parallel architectures offer the only way to achieve the necessary computational performance to address significant questions in biology. As parallel machines become even more widespread, we are likely to see the price:performance ratio continue to decrease rapidly making such calculations cost-effective.

Wed Sep 24 12:24:18 BST 1997