Main paper

The traditional formulation of density-functional theory [1] (DFT) in terms of a set of extended single-particle wave functions has led to the development of schemes for performing total-energy calculations which require a computational effort which scales as the cube of the system-size (i.e. the number of atoms, electrons or volume of the system). This scaling results from the cost of diagonalizing the Hamiltonian or orthogonalizing the wave functions. Methods based upon the single-particle density-matrix (DM), which is free from orthogonality constraints and short-ranged in real-space, offer the prospect of electronic structure calculations at a cost which scales only linearly with system-size. We investigate one scheme that has been proposed for achieving this goal, which uses a penalty-functional to impose the idempotency constraint on the DM. We show that the form of penalty-functional which must be chosen to obtain a variational principle for the total energy precludes the use of efficient minimization algorithms commonly used in electronic structure calculations. We apply the method to crystalline silicon and show that the desired minimum cannot be found and therefore that the variational principle cannot be used in practice.

In terms of a set of orthonormal orbitals $\{ \varphi_i \}$ and occupation numbers $\{ f_i \}$ , the DM $\rho$ is written as

$\begin{displaymath} \rho({\bf r},{\bf r'}) = \sum_i f_i ~ \varphi_i({\bf r}) \varphi_i^{\ast}({\bf r'}) . \end{displaymath}$

(1)

$\begin{displaymath} N[\rho] = 2 \int {\mathrm d}{\bf r} ~ \rho({\bf r},{\bf r}) = 2 \sum_i f_i = N_{\mathrm e} , \end{displaymath}$

(2)

$\begin{displaymath} \rho^2({\bf r},{\bf r'}) = \int {\mathrm d}{\bf r''} ~ \rho(... ...{\bf r''}) \rho({\bf r''},{\bf r'}) = \rho({\bf r},{\bf r'}) . \end{displaymath}$

(3)

$\begin{displaymath} E[\rho] = - \int {\mathrm d}{\bf r'} \left[ \nabla_{\bf r}^2... ...nt {\mathrm d}{\bf r} ~ V_{\mathrm{ext}}({\bf r}) n({\bf r}) , \end{displaymath}$

(4)

Exploiting the short-ranged behaviour of the DM, i.e. that $\rho({\bf r},{\bf r'}) \rightarrow 0$ exponentially [2] as $\left\vert {\bf r} - {\bf r'} \right\vert \rightarrow \infty$ , by imposing some spatial cut-off $r_{\mathrm{cut}}$ (such that $\rho({\bf r},{\bf r'}) = 0$ for $\left\vert {\bf r} - {\bf r'} \right\vert > r_{\mathrm{cut}}$ ) results in a linear-scaling method. The most significant hurdle to overcome is the imposition of the idempotency constraint. This can be achieved implicitly using a purifying transformation [3] which has been implemented in several tight-binding and DFT schemes [4].

An alternative approach to imposing the idempotency constraint has been proposed by Kohn [5], who suggested minimizing the functional ${\tilde Q}$ defined by

$\begin{displaymath} {\tilde Q}[\rho;\mu,\alpha] = E_{\mathrm{NI}}[\rho^2] - \mu N[\rho^2] + \alpha P[\rho] \end{displaymath}$

(5)

$\begin{displaymath} P[\rho] = \left\{ \int {\mathrm d}{\bf r} \left[ \rho^2 (1 -... ...over 2} = \left[ \sum_i f_i^2 (1 - f_i)^2 \right]^{1 \over 2}. \end{displaymath}$

(6)

Rather than minimizing the non-interacting energy, as proposed by Kohn, we can instead minimize the interacting energy self-consistently. Using the square of the DM to calculate $E_{\mathrm{NI}}$ in (5) has the advantage that it guarantees that the charge density is positive-definite. However, in order to simplify the analysis here we consider the functional

defined by

$\begin{displaymath} Q[\rho;\mu,\alpha] = E[\rho] - \mu N[\rho] + \alpha P[\rho] \end{displaymath}$

(7)

$\begin{displaymath} \frac{\partial Q[\rho;\mu,\alpha]}{\partial f_i} = 2 (\varepsilon_i - \mu) + \frac{\alpha}{P[\rho]} f_i (1-f_i)(1-2f_i) . \end{displaymath}$

(8)

$\begin{displaymath} \left. \frac{\partial Q[\rho;\mu,\alpha]}{\partial f_i} \rig... ..._i=\{0^{\pm},1^{\pm}\}} = 2 (\varepsilon_i - \mu) \pm \alpha , \end{displaymath}$

(9)

$\begin{displaymath} \alpha_{\mathrm c} > 2 \max_i \left\vert \varepsilon_i - \mu \right\vert . \end{displaymath}$

(10)

**Figure 1:** Schematic illustration of the variational principle: behaviour of the grand potential (dotted) and total functional (full) for representative values of $\alpha$ when the occupation number of a single state above the chemical potential is varied.
$\includegraphics [width=100mm]{fig1.eps}$

This is illustrated schematically in Fig. 1 for the case of a single occupation number corresponding to a state above the chemical potential. For $\alpha > \alpha_{\mathrm c}$ the minimum value of

is obtained when

. This outlines the variational principle established by Kohn, but we note that the functional

is minimal only in the sense that it takes its minimum value at the ground-state, but not in the sense that its derivatives vanish at the ground-state, since they are undefined at that point.

The penalty-functional

has a branch point at its minimum, due to the square-root form employed (6). However, this square-root is crucial to establishing the variational principle. In Fig. 2, the effect of using an analytic penalty-functional (the square of

) is plotted and it is clear that the minimum now occurs for

for all values of $\alpha$ . The total energy calculated in this case will no longer be a variational upper bound to the ground-state energy. We have recently introduced a method for obtaining accurate estimates of the true ground-state energy from such nearly-idempotent DMs, and details can be found elsewhere [7]. Nevertheless, the non-analytic form of the penalty-functional (6) must be employed if we wish to obtain a variational principle for the energy.

**Figure 2:** Schematic illustration of the lack of a variational principle when an analytic penalty-functional is used. For the case of unoccupied bands, as shown here, the minimum occurs for .
$\includegraphics [width=100mm]{fig2.eps}$

Several schemes exist for directly minimizing functions of many variables. The simplest of these is the method of steepest descents in which the gradient of the function is used as a search direction in the multidimensional space. The minimum value of the function along this direction is found and the process iterated to convergence. In Fig. 3a, the results of applying this method to an exactly quadratic function

are plotted, starting from the point (5,1). Successive search directions are always orthogonal, and the method is not guaranteed to find the minimum in a finite number of steps. The method is clearly inefficient, since all of the information from previous function and gradient evaluations is ignored when calculating new search directions. In contrast, the conjugate gradients method [8] uses this information to construct independent search directions. Each successive step effectively eliminates one dimension of the space to be searched, so that the minimum is found in a number of steps no greater than the dimensionality of the space. The results for this method are plotted in Fig. 3b for the same quadratic function plotted in Fig. 3a. This method has been successfully implemented in traditional electronic structure calculations [9].

**Figure 3:** Elliptic contours of a quadratic function showing search directions obtained by (a) the steepest descents method (directions are mutually orthogonal, and a large number of steps is required to find the minimum) and (b) the conjugate gradients method (only two steps required to find the minimum exactly).
$\includegraphics [width=135mm]{fig3.eps}$

However, the conjugate gradients method relies on the accuracy of a quadratic approximation of the function around the minimum. As observed, the functional

has a branch point at its minimum which arises from the square-root that appears in the penalty-functional, and is therefore non-analytic at the minimum. No multidimensional Taylor expansion for the functional exists about the minimum, and so the local information (functional values and gradients at points) used to construct the conjugate directions gives a misleading picture of the global behaviour of the functional. In Fig. 4a the results of a steepest descents minimization of the function $g(x,y) = \sqrt{f(x,y)} = \sqrt{x^2 +10 y^2}$ is plotted, and exactly the same sequence of points is generated as in Fig. 3a. However, the results obtained using the conjugate gradients method, plotted in Fig. 4b, are now worse than for steepest descents.

**Figure 4:** Elliptic contours for a function with a branch point at the minimum (the square-root of the function plotted in Fig. 3) with search directions for (a) the steepest descents method (exactly the same sequence of points is generated as in Fig. 3a) and (b) the conjugate gradients method (now less efficient than steepest descents).
$\includegraphics [width=135mm]{fig4.eps}$

In Fig. 4, the exact line minimum is found in each case. In practice, however, the position of the line minimum is usually estimated, by making a parabolic interpolation of the functional along the search direction using the initial value and first derivative of the functional, and its value at a trial step. In this case, the line minimum estimate will also be wrong, further reducing the efficiency in both cases.

These problems are not confined to the conjugate gradients method alone, but apply equally to any method which attempts to use gradients to build up an estimate of the Hessian of the functional e.g. the Fletcher-Powell or Broyden-Fletcher-Goldfarb-Shanno algorithms [10]. The Car-Parrinello scheme [11] would also fail in this case since the derivative of the penalty-functional would appear in the equations of motion for the molecular-dynamics Lagrangian. Simulated annealing methods [12] such as the Metropolis algorithm [13] will successfully minimize functions of this kind, since they do not use the gradients, but the number of iterations required in this case would increase with system-size as the number of dimensions to be searched increased, thus spoiling the linear scaling of the method.

We can attempt to find a set of conjugate directions for the non-quadratic functions encountered here. For a function $f({\bf x})$ with a quadratic minimum, a set of conjugate directions $\{ {\bf d}_k \}$ are defined by

$\begin{displaymath} \begin{array}{ll} \displaystyle{ {\bf d}_1 = - \nabla f({\bf... ...({\bf x}_k) + \gamma_k {\bf d}_k } \qquad & (k > 0) \end{array}\end{displaymath}$

(11)

$\begin{displaymath} \gamma_k = \frac{\nabla f({\bf x}_k) \cdot \nabla f({\bf x}_k)}{ \nabla f({\bf x}_{k-1}) \cdot \nabla f({\bf x}_{k-1})} . \end{displaymath}$

(12)

$\begin{displaymath} \gamma_k = \frac{g({\bf x}_k)}{g({\bf x}_{k-1})} \frac{\nabl... ...x}_k)}{ \nabla g({\bf x}_{k-1}) \cdot \nabla g({\bf x}_{k-1})} \end{displaymath}$

(13)

When using the conjugate gradients method to minimize functions which are not exactly quadratic in form, it is common practice to reset the conjugate directions after a certain number of iterations by taking a steepest descent step. For functions of the same form as

, the method becomes rapidly less efficient as the number of successive conjugate gradients steps is increased for the reasons discussed above. Indeed, for functions of many variables the method may completely fail to find the true minimum. Instead, the method appears to converge to a value which is not the minimum. This trend can be seen in Fig. 4b, in which the conjugate directions become orthogonal to the gradient. Therefore, the method fails not as a result of false minima, but because the minimization becomes so slow as to be indistinguishable from true convergence.

**Figure 5:** Variation of the contribution of the penalty-functional to the minimized total functional and the RMS error in the occupation numbers with respect to $\alpha$ for crystalline silicon.
$\includegraphics [width=100mm]{fig5.eps}$

In order to see whether such behaviour appears in a genuine DFT calculation, we have implemented this scheme to perform total energy calculations, and have applied it to crystalline silicon. The density-matrix was expanded in separable form in terms of a sparse Hermitian matrix and a set of localized functions, as in other schemes [4], and the localized functions themselves were centred on the ions and expanded in terms of a localized spherical-wave basis-set [14] using an energy cut-off of 200 eV and angular momentum components up to $\ell = 2$ . No attempt was made to converge the calculation with respect to the density-matrix cut-off. The variational principle proved by Kohn states that for $\alpha > \alpha_{\mathrm c}$ ,

at the minimum. In Fig. 5, we show the contribution of the penalty-functional $\alpha P$ to the total minimized functional

. As $\alpha$ is increased, this contribution does not vanish above some critical value (estimated from (10) and by Kohn's limit to be of the order of 50 eV for silicon) but rather decreases slowly. Also plotted in Fig. 5 is the corresponding root-mean-square error in the occupation numbers, which also decreases as $\alpha$ increases. Thus the total energy

will approach the true ground-state value, but no variational principle can be invoked, since this only holds for

exactly. This failure to observe Kohn's result in practice is due to the inability of the minimization procedure to locate the true minimum of the functional, due to its non-analytic form.

In conclusion, we have implemented a method based upon the variational principle derived by Kohn [5] and demonstrated a fundamental difficulty in using this method in a computational scheme which is due to the non-analyticity of the required penalty-functional. Calculations on crystalline silicon confirm the trends observed from considering simple model functions. We have shown that the variational principle cannot be exploited in practice because the nature of the functional makes it unsuitable for use with current minimization techniques. Other schemes based upon the purifying transformation, or which use an analytic penalty-functional to approximately impose idempotency (and subsequently correct for the error introduced due to the lack of idempotency) will therefore be more efficient.

We acknowledge helpful discussions with E. Sandré. PDH was supported by an EPSRC studentship.