next up previous
Next: 4. Tests of the Up: Preconditioned conjugate gradient method Previous: 2. Formulation of the


3. The iterative method

We shall now consider the case of a real, localized, non-orthogonal basis set, and assume that a real symmetric generalized eigenvalue problem is to be solved. To obtain the $M$ lowest eigenstates, we minimize the objective function $\Omega $ which is the sum of $M$ eigenvalues (formed by the Rayleigh quotients)
\begin{displaymath}
\Omega = \sum_{n=1}^{M} \varepsilon_n = \sum_{n=1}^{M} \frac...
...x_n}^{\beta} } { {x_n}^{\alpha}
S_{\alpha\beta} {x_n}^\beta },
\end{displaymath} (10)

subject to the orthogonality constraints of Eq. (8). $\Omega $ takes its minimum value when $\{ x_i;
i = 1,\ldots,M \}$ spans the same subspace as the $M$ lowest eigenvectors of Eq. (7). Even though the procedure given below is for a single eigenvector update, it can be generalized easily to an $M$-eigenvector (or block) update (see Appendix A).

The derivative of $\Omega $ with respect to ${x_m}^{\gamma}$ is

\begin{displaymath}
\frac{\partial \Omega}{\partial {x_m}^{\gamma}} = \frac{2}{
...
...{x_m}^{\mu} - \varepsilon_m S_{\gamma\nu} {x_m}^{\nu}
\right].
\end{displaymath} (11)

Eq. (11) defines a covariant gradient

\begin{displaymath}
{g_n}_{\alpha} = H_{\alpha\mu} {x_n}^{\mu} - \varepsilon_n
S_{\alpha\nu} {x_n}^{\nu}.
\end{displaymath} (12)

As pointed out by White et al. [19], it is important to consider the tensor property of the search direction. We define the dual basis functions $\chi^{\alpha}({\bf r})$ by the conditions
\begin{displaymath}
\int \d {\bf r}\ \chi^{\alpha}({\bf r}) \chi_{\beta}({\bf r}) =
{\delta^{\alpha}}_{\beta}.
\end{displaymath} (13)

The metric tensor $S^{\alpha\beta}$ can be defined in terms of the dual basis functions where
\begin{displaymath}
S^{\alpha\beta} = \int \d {\bf r}\ \chi^{\alpha}({\bf r})
\chi^{\beta}({\bf r}).
\end{displaymath} (14)

It can be shown that $S^{\alpha\beta}$ transforms covariant vectors into contravariant vectors and that $S^{\alpha\beta}S_{\beta\gamma} =
{\delta^{\alpha}}_{\gamma}$. Hence we transform the covariant gradient ${g_n}_{\alpha}$ into a contravariant gradient $ {g_n}^{\alpha}$ by using the metric tensor $S^{\alpha\beta}$ where
\begin{displaymath}
{g_n}^{\alpha} = S^{\alpha\beta} {g_n}_{\beta} =
S^{\alpha\...
..._{\beta\gamma} {x_n}^{\gamma}
- \varepsilon_n {x_n}^{\alpha}.
\end{displaymath} (15)

The contravariant gradient can then be used to update the eigenvector coefficients in Eq. (4).

The constraints of Eq. (8) can be maintained (to first order) by ensuring that the search direction ${g_n^{\bot}}^{\alpha}$ obtained from $ {g_n}^{\alpha}$ is orthogonal to the space spanned by all the current approximate eigenvectors. By writing

\begin{displaymath}
{g_n^{\bot}}^{\alpha} = S^{\alpha\beta} H_{\beta\gamma} {x_n...
...\varepsilon_n {x_n}^{\alpha} + \sum_{m} {x_m}^{\alpha} c_{mn},
\end{displaymath} (16)

and imposing the requirement that ${g_n^{\bot}}^{\alpha} S_{\alpha\beta}
{x_m}^{\beta} = 0$ for all $m$ and $n$, we find
\begin{displaymath}
c_{nm} = \varepsilon_n \delta_{nm} - {x_n}^{\alpha}
H_{\alpha\beta} {x_m}^{\beta}.
\end{displaymath} (17)

We then have
\begin{displaymath}
{g_n^{\bot}}^{\alpha} = S^{\alpha\beta} H_{\beta\gamma} {x_n...
..._m}^{\alpha} \left( {x_m}^{\mu} H_{\mu\nu}{x_n}^{\nu}
\right),
\end{displaymath} (18)

and
\begin{displaymath}
{g_n^{\bot}}_{\alpha} = H_{\alpha\beta} {x_n}^{\beta} -
S_{\...
...m}^{\beta} \left( {x_m}^{\mu}
H_{\mu\nu} {x_n}^{\nu} \right) .
\end{displaymath} (19)

We can use Eq. (18) as the steepest descent direction for constructing the conjugate gradients. However, the convergence of the solution depends strongly on the ratio of the largest and smallest eigenvalues of $H$ [14,20]. Since the largest eigenvalues are dominated by the basis functions with large kinetic energy, we precondition the search direction using the kinetic energy matrix $T$ where

\begin{displaymath}
T_{\alpha\beta} = -\frac{\hbar^2}{2m_{\mathrm e}} \int \d {\bf r}\
\chi_{\alpha}({\bf r}) \nabla^2 \chi_{\beta}({\bf r}).
\end{displaymath} (20)

We propose to obtain the preconditioned steepest descent direction $
{G_n}^{\alpha} $ in the same manner as in Ref. [14] from the equation
\begin{displaymath}
{G_n}^{\alpha} = (S + \frac{1}{\tau}T)^{\alpha\beta}
{g_n^{\bot}}_{\beta},
\end{displaymath} (21)

which amounts to finding $
{G_n}^{\alpha} $ by solving
\begin{displaymath}
(S+\frac{1}{\tau}T)_{\alpha\beta} {G_n}^{\beta} =
{g_n^{\bot}}_{\alpha}.
\end{displaymath} (22)

$\tau $ sets the kinetic energy scale for the preconditioning: components of the gradient corresponding to basis functions with kinetic energy much lower than $\tau $ are unaffected by the preconditioning, whereas the contribution of components with kinetic energy much higher than $\tau $ is suppressed. The limit $\tau
\rightarrow \infty$ thus corresponds to the case of no preconditioning, while the effect of preconditioning becomes stronger as $\tau \rightarrow 0$. Preconditioning which is too aggressive leads to a degradation of performance, and even the wrong answer being obtained, because it can reorder the lowest eigenvectors. We discuss the choice of $\tau $ in Sec. 4. This preconditioning scheme does not rely on the ``diagonal approximation'' used in Ref. [14], which is appropriate in that case because the overlap between different basis functions is not extensive. One can solve Eq. (22) by using the standard preconditioned conjugate gradient method for linear systems [4,21].

The search direction to be obtained from $
{G_n}^{\alpha} $ is also required to be orthogonal to all approximate eigenvectors. By carrying out the same procedure as in Eq. (16), we find the gradient which is orthogonal to all approximate eigenvectors is given by

\begin{displaymath}
{G_n^{\bot}}^{\alpha} = {G_n}^{\alpha} - \sum_{m} {x_m}^{\alpha} (
{x_m}^{\beta} S_{\beta\gamma} {G_n}^{\gamma} ).
\end{displaymath} (23)

In the conjugate gradient minimization method, $ {G_n^{\bot}}^{\alpha}
$ will be used to construct a conjugate search direction $
{D_n}^{\alpha} $ where
\begin{displaymath}
{D_n}^{\alpha} = -{G_n^{\bot}}^{\alpha} + \gamma {{\tilde{D}}_n}^{\alpha}
\end{displaymath} (24)

where ${\tilde{D}}_n$ is $D_n$ from the previous iteration. We give the expression for $ \gamma $ in the Polak-Ribière formula where
\begin{displaymath}
\gamma = \frac{ {G_n^{\bot}}^{\alpha} S_{\alpha\beta}
({g_n^...
...t} }{ {\tilde G}_n^{\bot\alpha} {\tilde
g}_{n\alpha}^{\bot} }.
\end{displaymath} (25)

The tilde signs again signify the quantities from the previous iteration. Line minimization (see Appendix B) of $\Omega $ is then performed along the direction $ D_n^{\bot\alpha} $ where
\begin{displaymath}
{D_n^{\bot}}^{\alpha} = {D_n}^{\alpha} - \sum_m {x_m}^{\alpha} (
{x_m}^{\beta} S_{\beta\gamma} {D_n}^{\gamma}),
\end{displaymath} (26)

which is orthogonal to all approximate eigenvectors. We can systematically update each eigenvector sequentially until the minimum value of $\Omega $ is found. The single eigenvector update procedure described above can be generalized to a block update procedure where all approximate eigenvectors are updated simultaneously. The pseudo-code for the block update procedure can be found in Appendix A.
next up previous
Next: 4. Tests of the Up: Preconditioned conjugate gradient method Previous: 2. Formulation of the
Peter Haynes