next up previous contents
Next: 7. Computational implementation Up: 6. Penalty Functionals Previous: 6.1 Kohn's method   Contents


6.2 Corrected penalty functional method

6.2.1 Derivation of the correction

In this section we present a new method to perform total energy calculations using a penalty functional to enforce idempotency approximately. We define a generalised energy functional $Q[{\rho};\alpha]$ for trial density-matrices ${\rho}$ by

Q[{\rho};\alpha] = E[\rho] + \alpha P[\rho] .
\end{displaymath} (6.15)

$P[\rho]$ represents the penalty functional, which in this new method is required to be analytic at all points so that efficient minimisation schemes such as conjugate gradients may be applied. The simplest example is to take the square of Kohn's penalty functional i.e.
P[{\rho}] = \int {\mathrm d}{\bf r}~\left[ {\rho}^2
\left( 1...]({\bf r},{\bf r}) =
\sum_i f_i^2 \left( 1 - f_i \right)^2 ,
\end{displaymath} (6.16)

but other choices are also possible (section 6.2.2). This penalty functional is sketched in figure 6.5.

Figure 6.5: One possible choice of analytic penalty functional.
\includegraphics [width=10cm]{pf1.eps}

Since the penalty functional becomes large as the density-matrix becomes less idempotent, minimisation of the total functional $Q[{\rho};\alpha]$ is stable against run-away solutions in which $f_i \rightarrow
\infty$ for occupied bands and $f_i \rightarrow
-\infty$ for unoccupied bands. However, as illustrated in the sketch in figure 6.6, the minimising density-matrix is only approximately idempotent. In this particular case of an unoccupied band, the total functional is minimised when this band is negatively occupied i.e. for $f_i < 0$. In general the minimising density-matrix ${\bar \rho}$ will have eigenvalues lying outside the interval $[0,1]$, so that the energy calculated from such a density-matrix will be below the true ground-state energy.

Figure 6.6: Schematic illustration of the analytic penalty functional: behaviour of the total energy (black) and total functional (red) with respect to a single occupation number for an unoccupied band.
\includegraphics [width=10cm]{mypen.eps}

We denote the set of occupation numbers which minimise the total functional $Q[{\rho};\alpha]$ by $\{ {\bar f}_i \}$ and the errors in these occupation numbers with respect to idempotency by ${\bar f}_i = f_i^{(0)} + \delta f_i$ where $f^{(0)}_i = 1$ for occupied bands and $f^{(0)}_i = 0$ for unoccupied bands. Since $Q[{\rho};\alpha]$ is minimised by ${\bar \rho}$, it is a minimum with respect to all changes in the occupation numbers which maintain the normalisation constraint

2 \int {\mathrm d}{\bf r}~{\rho}({\bf r},{\bf r}) = 2 \sum_i f_i
= N
\end{displaymath} (6.17)

which is imposed by introducing a Lagrange multiplier $\lambda$:
\frac{\partial}{\partial f_i} \left[
Q[{\rho};\alpha] - \lam...
...eft( 2 \sum_j f_j - N \right)
\right]_{f_i = {\bar f}_i} = 0 .
\end{displaymath} (6.18)

Using Janak's theorem for the derivative of the energy functional we obtain
{\bar \varepsilon}_i + \alpha {\bar f}_i \left( 1 - {\bar f}_i \right)
\left( 1 - 2 {\bar f}_i \right) - \lambda = 0
\end{displaymath} (6.19)

in which $\{ {\bar \varepsilon}_i \}$ are the eigenvalues of the Hamiltonian obtained from the electronic density ${\bar n}({\bf r}) = 2 {\bar \rho}
({\bf r},{\bf r})$ and are therefore different from the true ground-state eigenvalues. Assuming $\alpha $ to be sufficiently large so that the errors $\{ \delta f_i \}$ are small,
\delta f_i \approx -\frac{{\bar \varepsilon}_i - \lambda}{\alpha} .
\end{displaymath} (6.20)

Application of the normalisation constraint 6.17 requires $\displaystyle{\sum_i \delta f_i = 0}$ from which we obtain the value of the Lagrange multiplier $\lambda$:
\lambda = \frac{2}{N} \sum_i {\bar \varepsilon}_i
\end{displaymath} (6.21)

which is the mean energy eigenvalue. The variance of the errors in the occupation numbers is thus related to the variance of the energy eigenvalues, scaled by the parameter $\alpha $. Therefore, as is intuitively expected, the errors in the occupation numbers decrease as $\alpha $ is increased. More precisely, $\delta f_i \propto \alpha^{-1}$ and this behaviour is confirmed numerically in figure 6.7.

Figure 6.7: Variation of the occupation number errors with $\alpha $. Lines show the best fit to $\alpha^{-1}$ behaviour.
\includegraphics [width=12cm]{occerr.eps}

For small deviations from idempotency, the penalty functional $\displaystyle{P[{\bar \rho}] \approx \sum_i \delta f_i^2}$ so that the penalty contribution to the minimised total functional, $\alpha P[{\bar \rho}]$ also decreases in proportion to $\alpha^{-1}$. Hence the energy approaches the true ground-state energy as $\alpha \rightarrow \infty$, again with an error which decreases as $\alpha^{-1}$. This $\alpha^{-1}$ convergence is unsatisfactory for practical applications, since it requires large values of $\alpha $ to obtain accurate estimates of the ground-state energy, and for large values of $\alpha $ the penalty term dominates the total functional and hinders efficient minimisation of the energy term. We now proceed to derive a correction to the estimated energy which allows accurate values for the ground-state energy to be obtained from the approximately idempotent density-matrices which minimise the total functional.

At the minimum of the total functional,

\left. \frac{\partial Q[{\rho};\alpha]}{\partial f_i} \right...
... \left( 1 - {\bar f}_i \right) \left( 1 - 2 {\bar f}_i \right)
\end{displaymath} (6.22)

and this expression can be used to construct a first order Taylor expansion for the total energy with respect to the occupation numbers. We thus estimate the true ground-state energy $E_0$ to be
E_0 = E[\rho_0] \approx E[{\bar \rho}] + 2 \alpha \sum_i
...{\bar f}_i \right) \left( 1 - 2 {\bar f}_i \right)
\delta f_i.
\end{displaymath} (6.23)

For occupied bands, $\delta f_i = {\bar f}_i - 1$ whereas for unoccupied bands $\delta f_i = {\bar f}_i$ so that
E_0 \approx E[{\bar \rho}] - 2 \alpha \sum_i^{\rm all} {\bar...
...left( 1 - {\bar f}_i \right) \left( 1 - 2 {\bar f}_i \right) .
\end{displaymath} (6.24)

The first term of the correction has been written as a sum over all bands so that it can be expressed in terms of the trace $2 \alpha {\rm Tr} \left[
{\bar \rho} \left( 1 - {\bar \rho} \right)^2 \left( 1 - 2 {\bar \rho} \right)
\right]$ which can always be evaluated in ${\cal O}(N)$ operations. The second term only contributes when unoccupied bands are included in the calculation, which is not necessary for insulators. Since a single eigenvalue of the (sparse) density-matrix can always be obtained in ${\cal O}(N)$ operations, it is possible to evaluate the correction for a small number ($\ll N$) of unoccupied bands and retain the linear-scaling. However, we note that the correction need only be calculated once the minimum of the total functional has been found, so that a single ${\cal O}(N^2)$ step to obtain all of the occupation numbers will still be a tiny fraction of the total computational effort.

The error in a Taylor expansion is generally estimated by considering the lowest order term omitted, which in this case is

\frac{1}{2} \sum_{ij} \delta f_i \left. \frac{\partial^2 E[{...
...elta f_j \approx
\sum_{ij} \delta f_i {\cal H}_{ij} \delta f_j
\end{displaymath} (6.25)

where ${\cal H}_{ij}$ is the chemical hardness matrix. Unfortunately this matrix is not guaranteed to be either positive or negative definite, and so the estimate of the ground-state energy 6.24 is not a strict upper or lower bound to the true energy. As will be seen shortly, this error is generally much smaller than other sources of error (such as the finite support region size, with respect to which the energy does behave variationally) so that this is not an issue in practice.

In figure 6.8 the energy, total functional and corrected energy are plotted for different values of the parameter $\alpha $. These results confirm the $\alpha^{-1}$ behaviour of the energy and penalty functionals, and the error in the corrected energy even for $\alpha = 50$ eV is smaller than $10^{-4}$ eV per atom. Thus we have achieved our aim of being able to obtain accurate estimates of the ground-state energy from approximate density-matrices which minimise the total functional for values of $\alpha $ for which efficient minimisation is possible.

Figure 6.8: Total energy, total functional and corrected energy versus $\alpha $. Lines are best fits to $\alpha^{-1}$ behaviour.
\includegraphics [width=12cm]{corr.eps}

6.2.2 Further examples of penalty functionals

Before studying the efficiency of the minimisation procedure when applied to the total functional described above in section 6.2.1, we mention two further examples of penalty functionals which are suitable for this approach.

The first is applicable for positive semi-definite trial density-matrices. This requirement can be satisfied in practice by writing the density-kernel $K$ in terms of an auxiliary matrix $T$ as $K = T T^{\dag }$ (see section 4.4.2). Since the eigenvalues of such a density-matrix must be non-negative, variation of the energy functional alone is sufficient to drive the occupation numbers of unoccupied bands to zero, and the penalty functional need only impose the occupation numbers of the occupied bands to lie close to unity. An appropriate penalty functional is then

P[{\rho}] = \int {\mathrm d}{\bf r} \left[ {\rho} \left( 1 -...
({\bf r},{\bf r}) = \sum_i f_i \left( 1 - f_i \right)^2 ,
\end{displaymath} (6.26)

and the corresponding energy correction is
E_0 \approx E[{\bar \rho}] - \alpha \sum_i^{\rm all} (1 - 3 ...
... \alpha \sum_i^{\rm unocc} (1 - 3{\bar f}_i)(1 - {\bar f}_i) .
\end{displaymath} (6.27)

Numerical investigation has shown that the occupation numbers of the unoccupied bands do indeed become very small but positive when this scheme is used.

The second penalty functional is applicable only when no unoccupied bands are included in the calculation. In this case, all of the occupation numbers should equal unity and so an appropriate penalty functional is

P[{\rho}] = \int {\mathrm d}{\bf r} \left( 1 - {\rho} \right)^2
({\bf r},{\bf r}) = \sum_i \left( 1 - f_i \right)^2 .
\end{displaymath} (6.28)

The corresponding correction to the total energy in this case is
E_0 \approx E[{\bar \rho}] + 2 \alpha \sum_i^{\rm all} (1 - {\bar f}_i)^2 .
\end{displaymath} (6.29)

Both of these penalty functionals have been tested, and the results are very similar to those presented for the original functional in the previous section. These penalty functionals are plotted in figure 6.9.

Figure 6.9: Two further examples of analytic penalty functionals.
\includegraphics [width=13cm]{pf23.eps}

6.2.3 Minimisation efficiency

In this section we discuss the efficiency of the conjugate gradients algorithm to minimise the total functional. We restrict the discussion to the penalty functional introduced in section 6.2.1. The total functional derived from this penalty functional appears to possess multiple local minima since the penalty functional itself is minimal for all idempotent density-matrices. However, most of these local minima do not correspond to density-matrices obeying the correct normalisation constraint and are therefore eliminated by imposing this constraint during the minimisation, as will be shown in chapter 7. Of the remaining minima, only one corresponds to the situation in which the lowest bands are occupied, and when the support functions are also varied, all other minima become unstable with respect to this one (i.e. these are minima with respect to occupation number variations but not orbital variations). Numerical investigations into this matter have been carried out and no problems arising from multiple minima have been observed (the minimised total functional has the same value independent of the starting point).

The efficiency with which the conjugate gradients scheme is able to minimise a function is known to depend upon the condition number $\kappa$, the ratio of the largest curvature to the smallest curvature at the minimum. The condition number may be calculated exactly by determining the Hessian matrix at the minimum, but may also be estimated as follows [153].

Consider the minimising density-matrix ${\bar \rho}$ expanded in terms of a set of orthonormal orbitals $\{ {\bar \varphi}_i({\bf r}) \}$:

{\bar \rho}({\bf r},{\bf r'}) = \sum_i {\bar f}_i {\bar \varphi}_i({\bf r})
{\bar \varphi}_i({\bf r'}) .
\end{displaymath} (6.30)

Consider first perturbing the occupation numbers subject to the normalisation constraint i.e. increasing the occupation of some orbital labelled $J$ by $x$ at the expense of another orbital labelled $I$. The density-matrix becomes
{\rho}({\bf r},{\bf r'}) = {\bar \rho}({\bf r},{\bf r'})
- x...
... r'}) +
x {\bar \varphi}_J({\bf r}) {\bar \varphi}_J({\bf r'})
\end{displaymath} (6.31)

Defining ${\bar Q} = Q[{\bar \rho};\alpha]$ and using the orthonormality of the orbitals,
$\displaystyle Q[{\rho};\alpha]$ $\textstyle =$ $\displaystyle {\bar Q} + x({\bar \varepsilon}_J - {\bar \varepsilon}_I)
- \alph...
...}_I) (1 - {\bar f}_I) +
{\bar f}_J (1 - 2 {\bar f}_J) (1 - {\bar f}_J)\} \bigr.$  
  $\textstyle -$ $\displaystyle \bigl. 2 x^2 \{1 - 3{\bar f}_I
(1 - {\bar f}_I) - 3{\bar f}_J (1 - {\bar f}_J)\} - 4 x^3 (1 - {\bar f}_I -
{\bar f}_J) - 2 x^4 \bigr]$ (6.32)

and the curvature at the minimum is
\left. \frac{\partial^2 Q[{\rho};\alpha]}{\partial x^2}\righ...
...f}_I (1 - {\bar f}_I) - 3{\bar f}_J (1 - {\bar f}_J) \right] .
\end{displaymath} (6.33)

Assuming that ${\bar \rho}$ is approximately idempotent so that both ${\bar f}_I$ and ${\bar f}_J$ are either roughly zero or unity,
\left. \frac{\partial^2 Q[{\rho};\alpha]}{\partial x^2}\right\vert _{x=0}
\approx 4 \alpha
\end{displaymath} (6.34)

i.e. to first order, the curvature is independent of the choice of orbitals $I$ and $J$ so that the functional is spherical when this type of variation is considered, and the condition number is approximately unity.

The second type of variation is a unitary transformation of the orbitals i.e. $\varphi_I({\bf r}) = (1 - {\textstyle{1 \over 2}} x^2)
{\bar \varphi}_I({\bf r}) + x {\bar \varphi}_J({\bf r})$ and $\varphi_J({\bf r}) = (1 - {\textstyle{1 \over 2}} x^2)
{\bar \varphi}_J({\bf r}) - x {\bar \varphi}_I({\bf r})$, which maintains normalisation of the density-matrix to ${\cal O}(x^2)$. In this case

$\displaystyle {\rho}({\bf r},{\bf r'})$ $\textstyle =$ $\displaystyle {\bar \rho}({\bf r},{\bf r'})
+x({\bar f}_I - {\bar f}_J) {\bar \...
...x({\bar f}_I - {\bar f}_J) {\bar \varphi}_J({\bf r}) {\bar \varphi}_I({\bf r'})$  
  $\textstyle +$ $\displaystyle x^2({\bar f}_J-{\bar f}_I) {\bar \varphi}_I({\bf r}) {\bar \varph...
...\bar f}_J) {\bar \varphi}_J({\bf r}) {\bar \varphi}_J({\bf r'}) +
{\cal O}(x^3)$ (6.35)

and similarly
Q[{\rho};\alpha] = {\bar Q} + x^2 ({\bar f}_I - {\bar f}_J)({\bar
\varepsilon}_J - {\bar \varepsilon}_I) + {\cal O}(x^3)
\end{displaymath} (6.36)

so that
\left. \frac{\partial^2 Q[{\rho};\alpha]}{\partial x^2}\righ...
...I - {\bar f}_J)({\bar \varepsilon}_J - {\bar \varepsilon}_I) .
\end{displaymath} (6.37)

The maximum curvature is thus obtained when ${\bar f}_I \approx 1$ and ${\bar f}_J \approx 0$ and equals $2 ({\bar \varepsilon}_{\rm max} -
{\bar \varepsilon}_{\rm min})$ where ${\bar \varepsilon}_{\rm max}$ and ${\bar \varepsilon}_{\rm min}$ are the maximum and minimum energy eigenvalues of the orbitals $\{ {\bar \varphi}_i({\bf r}) \}$. The minimum curvature is obtained when ${\bar f}_I \approx {\bar f}_J$ and is therefore
2 (\delta f_I - \delta f_J) ({\bar \varepsilon}_J - {\bar \v...
= \frac{2 (\Delta {\bar \varepsilon})^2}{\alpha}
\end{displaymath} (6.38)

where $\Delta {\bar \varepsilon}$ is the minimum energy eigenvalue spacing. This curvature corresponds to unitary changes confined within the occupied or unoccupied subspaces with no mixing between, and the energy is indeed invariant under such changes. However, choosing to work with localised functions essentially defines a particular unitary transformation for the wave-functions, so that these variations are generally eliminated. If this is the case, then the minimum curvature will then be obtained in the same way as the maximum curvature, but seeking the minimum difference in energy eigenvalues between valence and conduction bands, which is the band gap ${\bar \varepsilon}_{\mathrm g}$. The minimum curvature is thus $2 {\bar \varepsilon}_{\mathrm g}$ and the condition number is given by
\kappa = \frac{{\bar \varepsilon}_{\rm max} - {\bar \varepsilon}_{\rm min}}
{{\bar \varepsilon}_{\mathrm g}} .
\end{displaymath} (6.39)

This is an encouraging result, since the condition number is independent of the system-size.

The length of the error vector after $k$ iterations, $\eta_k$ is related to $\kappa$ by [154]

\eta_k \propto \left( \frac{\sqrt{\kappa} - 1}{\sqrt{\kappa} + 1} \right)^k
\end{displaymath} (6.40)

and the number of iterations required to converge to a given precision is therefore proportional to $\sqrt{\kappa}$ in the limit of large $\kappa$, and so independent of system-size.

Reviewing the results for both types of variation, we note that the minimisation with respect to occupation numbers (the first type) is very efficient, since $\kappa \approx 1$, whereas the minimisation with respect to orbitals is less efficient, depending upon the ratio of the total width of the eigenvalue spectrum to the band gap. Preconditioning schemes to compress the eigenvalue spectrum have been developed for use with plane-waves [81] and also with $B$-splines [155], and a similar scheme for the spherical-wave basis functions would also improve the rate of convergence. Nevertheless, we do not expect a significant change in the number of conjugate gradient steps required to converge to the minimum as the system-size increases. In practical implementations, discussed in chapter 7, these two types of variation are not strictly separated, both the occupation numbers and the orbitals being varied simultaneously, so that these results are hard to confirm numerically, although no significant increase in the number of iterations required to converge to a given accuracy is observed as the system-size increases.

next up previous contents
Next: 7. Computational implementation Up: 6. Penalty Functionals Previous: 6.1 Kohn's method   Contents
Peter Haynes