next up previous contents
Next: 7.6 Tensor properties of Up: 7. Computational implementation Previous: 7.4 Physical interpretation   Contents


7.5 Occupation number preconditioning

The eigenvalues of the Hessian at a stationary point determine the nature and shape of that stationary point. Thus the shape of the ground-state minimum of an energy functional is determined by the eigenvalues of that functional. For the Kohn-Sham scheme these eigenvalues are the $\{ \varepsilon_i \}$ and the narrower the eigenvalue spectrum, the more ``spherical'' the minimum, and the easier the functional is to minimise. From equation 4.9 we note that when partial occupation numbers are introduced, the relevant eigenvalue spectrum becomes $\{ f_i \varepsilon_i \}$. When conduction bands are included in a calculation, their occupation numbers will be vanishingly small near the ground-state minimum, which will therefore be very aspherical, and convergence of these bands will become very slow. This problem has been addressed in the study of metallic systems [164,165] by the method of preconditioning which changes the metric of the parameter space to compress the eigenvalue spectrum and make the minimum more spherical.

With reference to the results in appendix B, we introduce a metric, represented by the matrix ${\cal M}$, such that a new set of variables (denoted by a tilde) is introduced:

\begin{displaymath}
{\tilde{\bf x}} = {\cal M} \cdot {\bf x}
\end{displaymath} (7.53)

and so that the new gradients are related to the old gradients by
\begin{displaymath}
{\tilde{\bf g}} = {\cal M}^{-1} \cdot {\bf g} .
\end{displaymath} (7.54)

In the new metric, the conjugate directions are defined (see appendix B) by
$\displaystyle {\tilde{\bf p}}_{r+1}$ $\textstyle =$ $\displaystyle -{\tilde{\bf g}}_{r+1} + {\tilde \beta}_r
{\tilde{\bf p}}_r$ (7.55)
$\displaystyle {\tilde \beta}_r$ $\textstyle =$ $\displaystyle \frac{{\tilde{\bf g}}_{r+1} \cdot
{\tilde{\bf g}}_{r+1}}{{\tilde{\bf g}}_r \cdot {\tilde{\bf g}}_r}$ (7.56)

where we have adopted the Fletcher-Reeves method (B.22) for calculating $\beta_r$. The line minimum is given by
\begin{displaymath}
{\tilde{\bf x}}_{r+1} = {\tilde{\bf x}}_r + {\alpha}_r {\tilde{\bf p}}_r
\end{displaymath} (7.57)

which can be rewritten in terms of the original variables as
$\displaystyle {\cal M} \cdot {\bf x}_{r+1}$ $\textstyle =$ $\displaystyle {\cal M} \cdot {\bf x}_r + {\alpha}_r {\tilde{\bf p}}_r$  
$\displaystyle \Rightarrow {\bf x}_{r+1}$ $\textstyle =$ $\displaystyle {\bf x}_r + {\alpha}_r {\cal M}^{-1} \cdot {\tilde{\bf p}}_r .$ (7.58)

This identifies ${\cal M}^{-1} \cdot {\tilde{\bf p}}_r$ as the set of preconditioned conjugate gradients for the original variables in the original space. These directions $\{ {\bf P}_r \}$ are thus given by
\begin{displaymath}
{\bf P}_{r+1} = {\cal M}^{-1} \cdot {\tilde{\bf p}}_{r+1} = ...
...-{\cal M}^{-2} \cdot {\bf g}_{r+1}
+{\tilde \beta}_r {\bf P}_r
\end{displaymath} (7.59)

so that the gradients to be used for the preconditioned search are
\begin{displaymath}
{\bf G}_r = {\cal M}^{-2} \cdot {\bf g}_r
\end{displaymath} (7.60)

with mixing factor
\begin{displaymath}
{\tilde \beta}_r = \frac{{\tilde{\bf g}}_{r+1} \cdot
{\tilde...
...bf G}_{r+1} \cdot {\bf g}_{r+1}}
{{\bf G}_r \cdot {\bf g}_r} .
\end{displaymath} (7.61)

It is observed that defining ${\tilde \beta}_r$ in terms of the preconditioned gradients alone does not interfere with the minimisation procedure, so that in practice
\begin{displaymath}
{\tilde \beta}_r = \frac{{\bf G}_{r+1} \cdot {\bf G}_{r+1}}
{{\bf G}_r \cdot {\bf G}_r} .
\end{displaymath} (7.62)

In order to apply this scheme here, we choose to make the metric ${\cal M}$ diagonal in the representation of the Kohn-Sham orbitals. In the original variables $\{ x_i \}$ (the subscript $i$ labels a component of a vector) the minimum can be expanded as $\sum_i f_i \varepsilon_i
x_i^2$ so that the scaled variables $\{ {\tilde x}_i \}$ defined by ${\tilde x}_i = m_{(i)} x_i = \sqrt{f_i} x_i$ (where $M_{ij} = m_{(i)} \delta_{ij}$) produce the desired compression since in terms of the new variables, the minimum has the form $\sum_i \varepsilon_i {\tilde x}_i^2$. In the representation of the Kohn-Sham orbitals, the gradient of the functional $Q[\rho;\alpha]$ (7.52) becomes

\begin{displaymath}
4 m_{(i)}^{-2}
\left[ f_i \left\{ {\hat H} -\varepsilon_i ...
...})
= 4 \left[ {\hat H} - \varepsilon_i \right]
\psi_i({\bf r})
\end{displaymath} (7.63)

in which we see that the factor of $f_i$ in front of the ${\hat H}$ operator has been cancelled so that the effect of the gradient is now the same on both occupied and unoccupied bands, and these bands should now converge at the same rate.

We now transform the preconditioned gradient back to the support function representation using

\begin{displaymath}
\frac{\delta Q[\rho;\alpha]}{\delta \phi_{\alpha}({\bf r})} ...
...alpha i} \frac{\delta Q[\rho;\alpha]}{\delta \psi_i({\bf r})}
\end{displaymath} (7.64)

where $V = S^{-{1 \over 2}} U$ from 4.78 and $U$ is a unitary matrix. Thus the preconditioned gradient we require is
\begin{displaymath}
4 V_{\alpha i} \left[ {\hat H} + \alpha f_i (1 - f_i)
(1 - 2...
...SK)(1-2SK)\right)^{\alpha \beta} \right] \phi_{\beta}({\bf r})
\end{displaymath} (7.65)

from the properties of the matrix $V$ (4.79, 4.80). We see that the gradient has thus been pre-multiplied by the matrix $(KS)^{-1}$ i.e. in the support function representation, the metric ${\cal M}$ is $(KS)^{1 \over 2}$.

Although the overlap matrix $S$ is a sparse matrix for localised support functions, its inverse $S^{-1}$ is not sparse in general, so that this scheme is not straightforward to implement. For a sufficiently diagonally dominant overlap matrix, it is possible to approximate the inverse in the following manner. We write $S = D + E$ where $D$ contains only the diagonal elements of $S$ and $E$ contains the off-diagonal elements. $D$ is thus trivial to diagonalise. Writing $S = D(1 + D^{-1}E)$ we have $S^{-1} = (1 + D^{-1}E)^{-1} D^{-1}$ and if $S$ is diagonally dominant, the elements of the matrix $D^{-1}E$ are small so that we can approximate the inverse of the term in brackets. If the elements of a matrix $M$ are small then

$\displaystyle (1 + M)^{-1}$ $\textstyle =$ $\displaystyle 1 - M + M^2 - M^3 + {\cal O}(M^4)$  
  $\textstyle =$ $\displaystyle \sum_{n=0}^{\infty} (-1)^n M^n .$ (7.66)

When the first few terms of equation 7.66 are applied to the inverse overlap matrix we obtain
$\displaystyle S^{-1}$ $\textstyle =$ $\displaystyle (1 - D^{-1}E + D^{-1}ED^{-1}E - \ldots) D^{-1}$  
  $\textstyle =$ $\displaystyle D^{-1} - D^{-1} E D^{-1} + D^{-1} E D^{-1} E D^{-1} - \ldots$ (7.67)

This expression could be used to obtain a good approximation to the inverse overlap matrix, which may be sufficient for preconditioning, but there is the danger that, particularly for large systems, the overlap matrix may become singular and the performance of the algorithm would deteriorate. In the following section, however, we show that the correct preconditioned gradient does not involve the inverse of the overlap matrix.
next up previous contents
Next: 7.6 Tensor properties of Up: 7. Computational implementation Previous: 7.4 Physical interpretation   Contents
Peter Haynes