Next: 7. Computational implementation
Up: 6. Penalty Functionals
Previous: 6.1 Kohn's method
Contents
Subsections
6.2 Corrected penalty functional method
6.2.1 Derivation of the correction
In this section we present a new method to perform total energy calculations
using a penalty functional to enforce idempotency approximately.
We define a generalised energy functional
for trial
densitymatrices by

(6.15) 
represents the penalty functional, which in this new
method is required to be analytic at all points
so that efficient minimisation schemes such as conjugate gradients may be
applied. The simplest example is to take the square of Kohn's penalty
functional i.e.

(6.16) 
but other choices are also possible (section 6.2.2).
This penalty functional is sketched in figure 6.5.
Figure 6.5:
One possible choice of analytic penalty functional.

Since the penalty functional becomes large as the densitymatrix becomes
less idempotent, minimisation of the total functional
is stable against runaway solutions in which
for occupied bands and
for unoccupied
bands. However, as illustrated in the sketch in figure 6.6,
the minimising densitymatrix is only approximately idempotent. In this
particular case of an unoccupied band, the total functional is minimised
when this band is negatively occupied i.e. for .
In general the minimising densitymatrix will have eigenvalues
lying
outside the interval , so that the energy calculated from such a
densitymatrix will be below the true groundstate energy.
Figure 6.6:
Schematic illustration of the analytic penalty functional: behaviour of the total energy (black)
and total functional (red) with respect to a single occupation number for an
unoccupied band.

We denote the set of occupation numbers which minimise the total functional
by
and the errors in these
occupation numbers with respect to idempotency by
where for occupied bands and for unoccupied bands.
Since
is minimised by , it is a
minimum with respect to all changes in the occupation numbers which maintain
the normalisation constraint

(6.17) 
which is imposed by introducing a Lagrange multiplier :

(6.18) 
Using Janak's theorem for the derivative of the energy functional we obtain

(6.19) 
in which
are the eigenvalues of the Hamiltonian
obtained from the electronic density
and are therefore different from the true groundstate
eigenvalues.
Assuming to be sufficiently large so that the errors
are small,

(6.20) 
Application of the normalisation constraint 6.17 requires
from which we obtain the value of
the Lagrange multiplier :

(6.21) 
which is the mean energy eigenvalue. The variance of the
errors in the occupation numbers is thus related to the variance of the
energy eigenvalues, scaled by the parameter . Therefore, as is
intuitively expected, the errors in the occupation numbers decrease as
is increased. More precisely,
and
this behaviour is confirmed numerically in figure 6.7.
Figure 6.7:
Variation of the occupation number errors with . Lines
show the best fit to behaviour.

For small deviations from idempotency, the penalty functional
so that the
penalty contribution to the minimised total functional,
also decreases in proportion to . Hence the energy approaches the
true groundstate energy as
, again with an
error which decreases as . This convergence
is unsatisfactory for practical applications, since it requires large
values of to obtain accurate estimates of the groundstate energy,
and for large values of the penalty term dominates the total
functional and hinders efficient minimisation of the energy term.
We now proceed to derive a correction to the estimated energy which allows
accurate values for the groundstate energy to be obtained from the
approximately idempotent densitymatrices which minimise the total
functional.
At the minimum of the total functional,

(6.22) 
and this expression can be used to construct a first order Taylor
expansion for the total energy with respect to the occupation numbers.
We thus estimate the true groundstate energy to be

(6.23) 
For occupied bands,
whereas for unoccupied
bands
so that

(6.24) 
The first term of the correction has been written as a sum over all bands
so that it can be expressed in terms of the trace
which can always be evaluated in operations. The
second term only contributes when unoccupied bands are included in the
calculation, which is not necessary for insulators. Since a single
eigenvalue of the (sparse) densitymatrix can always be obtained in
operations, it is possible to evaluate the correction for
a small number () of unoccupied bands and retain the linearscaling.
However, we note that the correction need only be calculated once the
minimum of the total functional has been found, so that a single
step to obtain all of the occupation numbers will still
be a tiny fraction of the total computational effort.
The error in a Taylor expansion is generally estimated by considering
the lowest order term omitted, which in this case is

(6.25) 
where is the chemical hardness matrix. Unfortunately this
matrix is not guaranteed to be either positive or negative definite,
and so the estimate of the groundstate energy 6.24 is not a strict
upper or lower bound to the true energy. As will be seen shortly, this error
is generally much smaller than other sources of error (such as the finite
support region size, with respect to which the energy does behave
variationally) so that this is not an issue in practice.
In figure 6.8 the energy, total functional and corrected energy
are plotted for different values of the parameter . These results
confirm the behaviour of the energy and penalty functionals,
and the error in the corrected energy even for eV is smaller
than eV per atom. Thus we have achieved our aim of being able
to obtain accurate estimates of the groundstate energy from approximate
densitymatrices which minimise the total functional for values of
for which efficient minimisation is possible.
Figure 6.8:
Total energy, total functional and corrected energy versus . Lines are best fits to behaviour.

6.2.2 Further examples of penalty functionals
Before studying the efficiency of the minimisation procedure when applied
to the total functional described above in section 6.2.1,
we mention two further examples
of penalty functionals which are suitable for this approach.
The first is applicable for positive semidefinite trial densitymatrices. This
requirement can be satisfied in practice by writing the densitykernel
in terms of an auxiliary matrix as
(see section 4.4.2). Since
the eigenvalues of such a densitymatrix must be nonnegative, variation
of the energy functional alone is sufficient to drive the occupation
numbers of unoccupied bands to zero, and the penalty functional need only
impose the occupation numbers of the occupied bands to lie close to unity.
An appropriate penalty functional is then

(6.26) 
and the corresponding energy correction is

(6.27) 
Numerical investigation has shown that the occupation numbers of the
unoccupied bands do indeed become very small but positive when this
scheme is used.
The second penalty functional is applicable only
when no unoccupied bands are included in the calculation. In this case,
all of the occupation numbers should equal unity and so an appropriate
penalty functional is

(6.28) 
The corresponding correction to the total energy in this case is

(6.29) 
Both of these penalty functionals have been tested, and the results are
very similar to those presented for the original functional in the previous
section. These penalty functionals are plotted in figure 6.9.
Figure 6.9:
Two further examples of analytic penalty functionals.

6.2.3 Minimisation efficiency
In this section we discuss the efficiency of the conjugate gradients
algorithm to minimise the total functional. We restrict the discussion
to the penalty functional introduced in section 6.2.1.
The total functional derived from this penalty functional appears to
possess multiple
local minima since the penalty functional itself is minimal for all
idempotent densitymatrices. However, most of these local minima do not
correspond to densitymatrices obeying the correct normalisation
constraint and are therefore eliminated by imposing this constraint
during the minimisation, as will be shown in chapter 7.
Of the remaining minima, only one corresponds to the situation in which
the lowest bands are occupied, and when the support functions are also
varied, all other minima become unstable with respect to this one
(i.e. these are minima with respect to occupation number variations but
not orbital variations).
Numerical investigations into this
matter have been carried out and no problems arising from multiple minima
have been observed (the minimised total functional has the same value
independent of the starting point).
The efficiency with which the conjugate gradients scheme is able to
minimise a function is known to depend upon the condition number
, the ratio of the largest curvature to the smallest curvature
at the minimum. The condition number may be calculated exactly by
determining the Hessian matrix at the minimum, but may also be estimated
as follows [153].
Consider the minimising densitymatrix expanded in terms of a
set of orthonormal orbitals
:

(6.30) 
Consider first perturbing the occupation numbers subject to the normalisation
constraint i.e. increasing the occupation of some orbital labelled by at the
expense of another orbital labelled . The densitymatrix becomes

(6.31) 
Defining
and using the orthonormality of the
orbitals,
and the curvature at the minimum is

(6.33) 
Assuming that is approximately idempotent so that
both and are either roughly zero or unity,

(6.34) 
i.e. to first order, the curvature is independent of the choice of orbitals
and so that the functional is spherical when this type of variation
is considered, and the condition number is approximately unity.
The second type of variation is a unitary transformation of the orbitals
i.e.
and
, which maintains normalisation of the
densitymatrix to .
In this case
and similarly

(6.36) 
so that

(6.37) 
The maximum curvature is thus obtained when
and
and equals
where
and
are the maximum and minimum energy
eigenvalues of the orbitals
.
The minimum curvature is obtained when
and
is therefore

(6.38) 
where
is the minimum energy eigenvalue spacing.
This curvature corresponds to unitary changes confined within the
occupied or unoccupied subspaces with no mixing between, and the energy is
indeed invariant under such changes. However, choosing to work with
localised functions essentially defines a particular unitary transformation
for the wavefunctions, so that these variations are generally eliminated.
If this is the case, then the minimum curvature will then
be obtained in the same way as the maximum curvature, but seeking the
minimum difference in energy eigenvalues between valence and conduction
bands, which is the band gap
. The
minimum curvature is thus
and the
condition number is given by

(6.39) 
This is an encouraging result, since the condition number is independent
of the systemsize.
The length of the error vector after iterations,
is related to by [154]

(6.40) 
and the number of iterations required to converge to a given precision is
therefore proportional to in the limit of large ,
and so independent of systemsize.
Reviewing the results for both types of variation, we note that the
minimisation with respect to occupation numbers (the first type) is very
efficient, since
, whereas the minimisation with respect to
orbitals is less efficient, depending upon the ratio of the total width of
the eigenvalue spectrum to the band gap.
Preconditioning schemes to compress the eigenvalue spectrum have been
developed for use with planewaves [81] and also with
splines [155], and a similar scheme for the sphericalwave
basis functions would also improve the rate of convergence.
Nevertheless, we do not expect a significant change in the
number of conjugate gradient steps required to converge to the minimum as the
systemsize increases.
In practical implementations, discussed in chapter 7, these
two types of variation are not strictly separated, both the occupation numbers and the
orbitals being varied simultaneously, so that these results are hard to
confirm numerically, although no significant increase in the number of
iterations required to converge to a given accuracy is observed as the
systemsize increases.
Next: 7. Computational implementation
Up: 6. Penalty Functionals
Previous: 6.1 Kohn's method
Contents
Peter Haynes