The traditional formulation of densityfunctional theory [1] (DFT) in terms of a set of extended singleparticle wave functions has led to the development of schemes for performing totalenergy calculations which require a computational effort which scales as the cube of the systemsize (i.e. the number of atoms, electrons or volume of the system). This scaling results from the cost of diagonalizing the Hamiltonian or orthogonalizing the wave functions. Methods based upon the singleparticle densitymatrix (DM), which is free from orthogonality constraints and shortranged in realspace, offer the prospect of electronic structure calculations at a cost which scales only linearly with systemsize. We investigate one scheme that has been proposed for achieving this goal, which uses a penaltyfunctional to impose the idempotency constraint on the DM. We show that the form of penaltyfunctional which must be chosen to obtain a variational principle for the total energy precludes the use of efficient minimization algorithms commonly used in electronic structure calculations. We apply the method to crystalline silicon and show that the desired minimum cannot be found and therefore that the variational principle cannot be used in practice.
In terms of a set of orthonormal orbitals
and
occupation numbers , the DM is written as
(1) 
(3) 
The energy functional is defined by
Exploiting the shortranged behaviour of the DM, i.e. that exponentially [2] as , by imposing some spatial cutoff (such that for ) results in a linearscaling method. The most significant hurdle to overcome is the imposition of the idempotency constraint. This can be achieved implicitly using a purifying transformation [3] which has been implemented in several tightbinding and DFT schemes [4].
An alternative approach to imposing the idempotency constraint
has been proposed by Kohn [5], who suggested minimizing
the functional defined by
Rather than minimizing the noninteracting energy, as proposed by
Kohn, we can instead minimize the interacting energy
selfconsistently. Using the square of the DM to calculate
in (5) has the advantage that it
guarantees that the charge density is positivedefinite. However, in
order to simplify the analysis here we consider the functional
defined by
(7) 
(8) 
(9) 

This is illustrated schematically in Fig. 1 for the case of a single occupation number corresponding to a state above the chemical potential. For the minimum value of is obtained when . This outlines the variational principle established by Kohn, but we note that the functional is minimal only in the sense that it takes its minimum value at the groundstate, but not in the sense that its derivatives vanish at the groundstate, since they are undefined at that point.
The penaltyfunctional has a branch point at its minimum, due to the squareroot form employed (6). However, this squareroot is crucial to establishing the variational principle. In Fig. 2, the effect of using an analytic penaltyfunctional (the square of ) is plotted and it is clear that the minimum now occurs for for all values of . The total energy calculated in this case will no longer be a variational upper bound to the groundstate energy. We have recently introduced a method for obtaining accurate estimates of the true groundstate energy from such nearlyidempotent DMs, and details can be found elsewhere [7]. Nevertheless, the nonanalytic form of the penaltyfunctional (6) must be employed if we wish to obtain a variational principle for the energy.

Several schemes exist for directly minimizing functions of many variables. The simplest of these is the method of steepest descents in which the gradient of the function is used as a search direction in the multidimensional space. The minimum value of the function along this direction is found and the process iterated to convergence. In Fig. 3a, the results of applying this method to an exactly quadratic function are plotted, starting from the point (5,1). Successive search directions are always orthogonal, and the method is not guaranteed to find the minimum in a finite number of steps. The method is clearly inefficient, since all of the information from previous function and gradient evaluations is ignored when calculating new search directions. In contrast, the conjugate gradients method [8] uses this information to construct independent search directions. Each successive step effectively eliminates one dimension of the space to be searched, so that the minimum is found in a number of steps no greater than the dimensionality of the space. The results for this method are plotted in Fig. 3b for the same quadratic function plotted in Fig. 3a. This method has been successfully implemented in traditional electronic structure calculations [9].

However, the conjugate gradients method relies on the accuracy of a quadratic approximation of the function around the minimum. As observed, the functional has a branch point at its minimum which arises from the squareroot that appears in the penaltyfunctional, and is therefore nonanalytic at the minimum. No multidimensional Taylor expansion for the functional exists about the minimum, and so the local information (functional values and gradients at points) used to construct the conjugate directions gives a misleading picture of the global behaviour of the functional. In Fig. 4a the results of a steepest descents minimization of the function is plotted, and exactly the same sequence of points is generated as in Fig. 3a. However, the results obtained using the conjugate gradients method, plotted in Fig. 4b, are now worse than for steepest descents.

In Fig. 4, the exact line minimum is found in each case. In practice, however, the position of the line minimum is usually estimated, by making a parabolic interpolation of the functional along the search direction using the initial value and first derivative of the functional, and its value at a trial step. In this case, the line minimum estimate will also be wrong, further reducing the efficiency in both cases.
These problems are not confined to the conjugate gradients method alone, but apply equally to any method which attempts to use gradients to build up an estimate of the Hessian of the functional e.g. the FletcherPowell or BroydenFletcherGoldfarbShanno algorithms [10]. The CarParrinello scheme [11] would also fail in this case since the derivative of the penaltyfunctional would appear in the equations of motion for the moleculardynamics Lagrangian. Simulated annealing methods [12] such as the Metropolis algorithm [13] will successfully minimize functions of this kind, since they do not use the gradients, but the number of iterations required in this case would increase with systemsize as the number of dimensions to be searched increased, thus spoiling the linear scaling of the method.
We can attempt to find a set of conjugate directions for the
nonquadratic functions encountered here. For a function
with a quadratic minimum, a set of conjugate directions
are defined by
(11) 
(12) 
(13) 
When using the conjugate gradients method to minimize functions which are not exactly quadratic in form, it is common practice to reset the conjugate directions after a certain number of iterations by taking a steepest descent step. For functions of the same form as , the method becomes rapidly less efficient as the number of successive conjugate gradients steps is increased for the reasons discussed above. Indeed, for functions of many variables the method may completely fail to find the true minimum. Instead, the method appears to converge to a value which is not the minimum. This trend can be seen in Fig. 4b, in which the conjugate directions become orthogonal to the gradient. Therefore, the method fails not as a result of false minima, but because the minimization becomes so slow as to be indistinguishable from true convergence.

In order to see whether such behaviour appears in a genuine DFT calculation, we have implemented this scheme to perform total energy calculations, and have applied it to crystalline silicon. The densitymatrix was expanded in separable form in terms of a sparse Hermitian matrix and a set of localized functions, as in other schemes [4], and the localized functions themselves were centred on the ions and expanded in terms of a localized sphericalwave basisset [14] using an energy cutoff of 200 eV and angular momentum components up to . No attempt was made to converge the calculation with respect to the densitymatrix cutoff. The variational principle proved by Kohn states that for , at the minimum. In Fig. 5, we show the contribution of the penaltyfunctional to the total minimized functional . As is increased, this contribution does not vanish above some critical value (estimated from (10) and by Kohn's limit to be of the order of 50 eV for silicon) but rather decreases slowly. Also plotted in Fig. 5 is the corresponding rootmeansquare error in the occupation numbers, which also decreases as increases. Thus the total energy will approach the true groundstate value, but no variational principle can be invoked, since this only holds for exactly. This failure to observe Kohn's result in practice is due to the inability of the minimization procedure to locate the true minimum of the functional, due to its nonanalytic form.
In conclusion, we have implemented a method based upon the variational principle derived by Kohn [5] and demonstrated a fundamental difficulty in using this method in a computational scheme which is due to the nonanalyticity of the required penaltyfunctional. Calculations on crystalline silicon confirm the trends observed from considering simple model functions. We have shown that the variational principle cannot be exploited in practice because the nature of the functional makes it unsuitable for use with current minimization techniques. Other schemes based upon the purifying transformation, or which use an analytic penaltyfunctional to approximately impose idempotency (and subsequently correct for the error introduced due to the lack of idempotency) will therefore be more efficient.
We acknowledge helpful discussions with E. Sandré. PDH was supported by an EPSRC studentship.