Localised discrete Fourier transform

We now present an accurate, linear-scaling method for calculating the kinetic energy matrix elements $T_{\alpha \beta}$ of equation (3). We use a mixed space Fourier transform approach that is applicable to any Bravais lattice symmetry. Fourier transformation is a natural method to adopt for this task since in a total-energy calculation one computes other terms, such as the electron density and the Hartree energy, using reciprocal space techniques. This implicitly defines the basis set that we use to be plane-waves and for consistency we should calculate the kinetic energy using the same basis set, i.e. using Fourier transform methods. Thus we calculate the $\nabla^{2} \phi$ term in reciprocal space, where the Laplacian operator is easy to apply, then transform the result back to real space and obtain the matrix elements $T_{\alpha \beta}$ by summation over grid points (4). One way to achieve this would be to perform a discrete FFT on each support function $\phi$ , using the periodicity of the entire simulation cell. However, unlike the FD algorithm, the FFT is not a local operation and the cost of applying the Laplacian to all the support functions in this way would be proportional to $N_{s}^{2}\log N_{s}$ , which clearly does not scale linearly with system size.

It is possible to overcome this undesirable scaling without compromising accuracy by performing the FFT over a restricted region of the simulation cell, which we call the `FFT box' (figure 1). Before defining the FFT box, there are two points that should be noted. Firstly, the operator $\hat{T}$ must be Hermitian. This will ensure that the kinetic energy matrix elements, $T_{\alpha \beta}$ are Hermitian, and hence the eigenvalues real. Secondly, when calculating two matrix elements such as $T_{\alpha \beta}$ and $T_{\gamma \beta}$ , we require the quantity $\hat{T} \phi_{\beta}$ in both cases. To be consistent, our method for calculating the matrix elements must be such that $\hat{T} \phi_{\beta}$ is the same in both cases, i.e. we require $\hat{T} \phi_{\beta}$ to have a unique and consistent representation throughout the calculation. It is important that both these conditions are satisfied when it comes to optimisation of the support functions during a total-energy calculation, and we shall return to this point later.

In order to fulfil the above requirements, it can be seen that for a given calculation the FFT box must be universal in shape and dimensions. As a result, it must be large enough to enclose any pair of overlapping support functions within the simulation cell. To define a suitable FFT box, we first consider a box with the same unit lattice vectors as the simulation cell, but of dimensions such that it exactly circumscribes the largest support region present in the simulation cell. We then define a box that is commensurate with this, but with sides that are twice as long (and hence a volume eight times as large). This we define to be the FFT box. It is clear that this FFT box is large enough to enclose any pair of support functions exhibiting any degree of overlap.

**Figure 1:** The simulation cell and an FFT box for a pair of overlapping support regions.
$\scalebox{0.45}[0.45]{\includegraphics*[0.0cm,0cm][29cm,17cm]{skylaris_FFTbox.eps} }$

To calculate a particular matrix element $T_{\alpha \beta}$ for two overlapping support functions $\phi_{\alpha}$ and $\phi_{\beta}$ , we imagine them as being enclosed within the FFT box defined above and we treat this region of real space as a miniature simulation cell. We Fourier transform $\phi_{\beta}$ using the periodicity of the FFT box and apply the Laplacian at each reciprocal lattice point using standard plane-wave techniques [1]. It is then a simple matter of using one more FFT to back-transform $\nabla^{2}\phi_{\beta}$ to real space and subsequently calculate $T_{\alpha \beta}$ by summation over the grid points of the FFT box, according to equation (4).

The result obtained by this process is equivalent to performing a Fourier transform of $\phi_{\beta}$ over the whole simulation cell, applying the Laplacian and then interpolating to a coarse, but still regular, reciprocal space grid with only $N_{box}$ points, $N_{box}$ being the number of grid points in the FFT box, before back-transforming to real space. This coarse sampling in reciprocal space has a negligible influence on the result because each support function is strictly localised in real space and therefore smooth in reciprocal space.

It is worth noting the implicit approximation that we make in calculating the kinetic energy in the way prescribed above. In general, $\nabla^{2}\phi_{\beta}$ is nonzero outside the support region of $\phi_{\beta}$ itself, and it is essential to take this into account in the calculations. By construction, we neglect contributions to the kinetic energy from support functions whose support regions do not overlap as we expect them to be negligibly small. This approximation may be controlled via a single parameter, the FFT box size, with respect to which the calculation may be converged if necessary. The same approximation is of course present in the FD method as well.

We expect certain advantages to the FFT box algorithm over FD based methods. Firstly, the FFT box method should be more accurate than any FD scheme since it takes into account information from every single point of the support function and not only locally. However, it is still perfectly local as far as parallelisation is concerned since we only deal with the points within a single FFT box each time, and this constitutes a very small region of the simulation cell. The parallelisation strategy in this case would still consist of partitioning the real space grid of the simulation cell into subregions of equal size and distributing them amongst PEs. Then, FFTs local to each PE are performed on FFT boxes enclosing pairs of overlapping support regions belonging completely to the simulation cell subregion of the given PE. For pairs of overlapping support regions containing grid points common to the subregion of more than one PE, the pair would have to be attributed to one PE and copied as a whole to it for the local FFT to proceed. This would involve some communication overhead, as in the FD case for pairs of overlapping support functions with points in more than one subregion. Another important advantage of the FFT box method is that it is applicable, without any modification, to regular grids of any Bravais lattice symmetry. This is not true of FD methods.

The number of grid points in a cubic FFT box is $N_{box}$ (which is related to $N_{\mathrm{reg}}$ by $N_{box}= 8 \times 6 N_{\mathrm{reg}}/\pi \simeq 15.3 N_{\mathrm{reg}}$ ). Therefore the computational cost of applying the FFT method to a single support function in such an FFT box is $2 N_{box} \log N_{box}$ , and thus for all support functions the cost is proportional to $2 N_s N_{box} \log N_{box}$ , where $N_{box}$ is independent of

. In other words the cost scales linearly with the number of atoms in the system.