The traditional formulation of density-functional theory  (DFT) in terms of a set of extended single-particle wave functions has led to the development of schemes for performing total-energy calculations which require a computational effort which scales as the cube of the system-size (i.e. the number of atoms, electrons or volume of the system). This scaling results from the cost of diagonalizing the Hamiltonian or orthogonalizing the wave functions. Methods based upon the single-particle density-matrix (DM), which is free from orthogonality constraints and short-ranged in real-space, offer the prospect of electronic structure calculations at a cost which scales only linearly with system-size. We investigate one scheme that has been proposed for achieving this goal, which uses a penalty-functional to impose the idempotency constraint on the DM. We show that the form of penalty-functional which must be chosen to obtain a variational principle for the total energy precludes the use of efficient minimization algorithms commonly used in electronic structure calculations. We apply the method to crystalline silicon and show that the desired minimum cannot be found and therefore that the variational principle cannot be used in practice.
In terms of a set of orthonormal orbitals
occupation numbers , the DM is written as
The energy functional is defined by
Exploiting the short-ranged behaviour of the DM, i.e. that exponentially  as , by imposing some spatial cut-off (such that for ) results in a linear-scaling method. The most significant hurdle to overcome is the imposition of the idempotency constraint. This can be achieved implicitly using a purifying transformation  which has been implemented in several tight-binding and DFT schemes .
An alternative approach to imposing the idempotency constraint
has been proposed by Kohn , who suggested minimizing
the functional defined by
Rather than minimizing the non-interacting energy, as proposed by
Kohn, we can instead minimize the interacting energy
self-consistently. Using the square of the DM to calculate
in (5) has the advantage that it
guarantees that the charge density is positive-definite. However, in
order to simplify the analysis here we consider the functional
This is illustrated schematically in Fig. 1 for the case of a single occupation number corresponding to a state above the chemical potential. For the minimum value of is obtained when . This outlines the variational principle established by Kohn, but we note that the functional is minimal only in the sense that it takes its minimum value at the ground-state, but not in the sense that its derivatives vanish at the ground-state, since they are undefined at that point.
The penalty-functional has a branch point at its minimum, due to the square-root form employed (6). However, this square-root is crucial to establishing the variational principle. In Fig. 2, the effect of using an analytic penalty-functional (the square of ) is plotted and it is clear that the minimum now occurs for for all values of . The total energy calculated in this case will no longer be a variational upper bound to the ground-state energy. We have recently introduced a method for obtaining accurate estimates of the true ground-state energy from such nearly-idempotent DMs, and details can be found elsewhere . Nevertheless, the non-analytic form of the penalty-functional (6) must be employed if we wish to obtain a variational principle for the energy.
Several schemes exist for directly minimizing functions of many variables. The simplest of these is the method of steepest descents in which the gradient of the function is used as a search direction in the multidimensional space. The minimum value of the function along this direction is found and the process iterated to convergence. In Fig. 3a, the results of applying this method to an exactly quadratic function are plotted, starting from the point (5,1). Successive search directions are always orthogonal, and the method is not guaranteed to find the minimum in a finite number of steps. The method is clearly inefficient, since all of the information from previous function and gradient evaluations is ignored when calculating new search directions. In contrast, the conjugate gradients method  uses this information to construct independent search directions. Each successive step effectively eliminates one dimension of the space to be searched, so that the minimum is found in a number of steps no greater than the dimensionality of the space. The results for this method are plotted in Fig. 3b for the same quadratic function plotted in Fig. 3a. This method has been successfully implemented in traditional electronic structure calculations .
However, the conjugate gradients method relies on the accuracy of a quadratic approximation of the function around the minimum. As observed, the functional has a branch point at its minimum which arises from the square-root that appears in the penalty-functional, and is therefore non-analytic at the minimum. No multidimensional Taylor expansion for the functional exists about the minimum, and so the local information (functional values and gradients at points) used to construct the conjugate directions gives a misleading picture of the global behaviour of the functional. In Fig. 4a the results of a steepest descents minimization of the function is plotted, and exactly the same sequence of points is generated as in Fig. 3a. However, the results obtained using the conjugate gradients method, plotted in Fig. 4b, are now worse than for steepest descents.
In Fig. 4, the exact line minimum is found in each case. In practice, however, the position of the line minimum is usually estimated, by making a parabolic interpolation of the functional along the search direction using the initial value and first derivative of the functional, and its value at a trial step. In this case, the line minimum estimate will also be wrong, further reducing the efficiency in both cases.
These problems are not confined to the conjugate gradients method alone, but apply equally to any method which attempts to use gradients to build up an estimate of the Hessian of the functional e.g. the Fletcher-Powell or Broyden-Fletcher-Goldfarb-Shanno algorithms . The Car-Parrinello scheme  would also fail in this case since the derivative of the penalty-functional would appear in the equations of motion for the molecular-dynamics Lagrangian. Simulated annealing methods  such as the Metropolis algorithm  will successfully minimize functions of this kind, since they do not use the gradients, but the number of iterations required in this case would increase with system-size as the number of dimensions to be searched increased, thus spoiling the linear scaling of the method.
We can attempt to find a set of conjugate directions for the
non-quadratic functions encountered here. For a function
with a quadratic minimum, a set of conjugate directions
are defined by
When using the conjugate gradients method to minimize functions which are not exactly quadratic in form, it is common practice to reset the conjugate directions after a certain number of iterations by taking a steepest descent step. For functions of the same form as , the method becomes rapidly less efficient as the number of successive conjugate gradients steps is increased for the reasons discussed above. Indeed, for functions of many variables the method may completely fail to find the true minimum. Instead, the method appears to converge to a value which is not the minimum. This trend can be seen in Fig. 4b, in which the conjugate directions become orthogonal to the gradient. Therefore, the method fails not as a result of false minima, but because the minimization becomes so slow as to be indistinguishable from true convergence.
In order to see whether such behaviour appears in a genuine DFT calculation, we have implemented this scheme to perform total energy calculations, and have applied it to crystalline silicon. The density-matrix was expanded in separable form in terms of a sparse Hermitian matrix and a set of localized functions, as in other schemes , and the localized functions themselves were centred on the ions and expanded in terms of a localized spherical-wave basis-set  using an energy cut-off of 200 eV and angular momentum components up to . No attempt was made to converge the calculation with respect to the density-matrix cut-off. The variational principle proved by Kohn states that for , at the minimum. In Fig. 5, we show the contribution of the penalty-functional to the total minimized functional . As is increased, this contribution does not vanish above some critical value (estimated from (10) and by Kohn's limit to be of the order of 50 eV for silicon) but rather decreases slowly. Also plotted in Fig. 5 is the corresponding root-mean-square error in the occupation numbers, which also decreases as increases. Thus the total energy will approach the true ground-state value, but no variational principle can be invoked, since this only holds for exactly. This failure to observe Kohn's result in practice is due to the inability of the minimization procedure to locate the true minimum of the functional, due to its non-analytic form.
In conclusion, we have implemented a method based upon the variational principle derived by Kohn  and demonstrated a fundamental difficulty in using this method in a computational scheme which is due to the non-analyticity of the required penalty-functional. Calculations on crystalline silicon confirm the trends observed from considering simple model functions. We have shown that the variational principle cannot be exploited in practice because the nature of the functional makes it unsuitable for use with current minimization techniques. Other schemes based upon the purifying transformation, or which use an analytic penalty-functional to approximately impose idempotency (and subsequently correct for the error introduced due to the lack of idempotency) will therefore be more efficient.
We acknowledge helpful discussions with E. Sandré. PDH was supported by an EPSRC studentship.