next up previous contents
Next: Bibliography Up: thesis Previous: A. Bessel function identities   Contents


B. Conjugate gradients

In order to find the minimum of some function $f({\bf x})$, it is of course necessary to solve the equation $\nabla f({\bf x}) = 0$, but this is not possible in practice. Rather, $\nabla f({\bf x})$ is used as a search direction in the multi-dimensional parameter-space of vectors ${\bf x}$ to minimise the function iteratively. One way to do this is to move along these directions of steepest descent, finding the minimum along each one and then calculating a new direction from that minimum until we find the ground-state. The minimum along a certain direction (the line minimum) is found when the direction along which we are searching becomes perpendicular to the gradient. Thus if we use this method of steepest descents, consecutive search directions are always perpendicular, and it is clear that this is inefficient since we are only using the current steepest descent direction and throwing away all previous knowledge which could be used to build up a more accurate picture of the functional we are trying to minimise. In fact, the steepest descents method is only efficient when the minimum is ``spherical'' i.e. when the eigenvalues of the Hessian are all of the same order. If this is not the case, then the method is very slow, and is not guaranteed to converge to the minimum in a finite number of steps. A particularly bad case is that of the ``narrow valley'' illustrated in figure B.1.

Figure B.1: Steepest descents method - a large number of steps is required to find the minimum.
\includegraphics [height=15cm,angle=-90]{sd.eps}

By contrast, the method of conjugate gradients [187] takes information from previous steps into account, but only requires that the previous search direction (rather than all previous search directions as might be expected) be stored. For a full description see [188]. We consider a general quadratic scalar function of a number of variables, written as a vector ${\bf x}$:

\begin{displaymath}
f({\bf x}) = {\textstyle{1 \over 2}} {\bf x} \cdot {\cal G} \cdot {\bf x} -
{\bf b} \cdot {\bf x}
\end{displaymath} (B.1)

in which ${\cal G}$ is a positive definite symmetric matrix (so that the function has a single global minimum) and ${\bf b}$ is some constant vector. We denote the line minima (i.e. the points at which the search direction changes) by the set of points $\{ {\bf x}_r \}$ and the gradients at those points are $\{ {\bf g}_r \}$
\begin{displaymath}
{\bf g}_r = \nabla f({\bf x}_r) = {\cal G} \cdot {\bf x}_r - {\bf b} .
\end{displaymath} (B.2)

We label the search directions $\{ {\bf p}_r \}$. In the steepest descents method, ${\bf p}_r = - {\bf g}_r$ and we move along this direction an amount described by the parameter ${\alpha_r}$ until at the end (at ${\bf x}_{r+1}$) the search direction is perpendicular to the gradient ${\bf g}_{r+1}$:
$\displaystyle {\bf x}_{r+1} = {\bf x}_r + {\alpha}_r {\bf p}_r = {\bf x}_r -
{\alpha}_r {\bf g}_r$     (B.3)
$\displaystyle {\bf g}_{r+1} \cdot {\bf p}_r = - {\bf g}_{r+1} \cdot {\bf g}_r = 0$     (B.4)

which proves that consecutive search directions are always mutually perpendicular in this method. Now, for $f({\bf x})$ defined by equation B.1 we have
\begin{displaymath}
{\bf g}_r - {\bf g}_s = {\cal G} \cdot ({\bf x}_r - {\bf x}_s) .
\end{displaymath} (B.5)

The minimum of $f({\bf x})$ along the direction ${\bf p}_r$ is at ${\bf x}_{r+1} = {\bf x}_r + {\alpha}_r {\bf p}_r$ and is still given by condition (B.4), so that ${\alpha}_r$ is given by
\begin{displaymath}
{\alpha}_r = - \frac{{\bf p}_r \cdot {\bf g}_r}{{\bf p}_r \cdot {\cal G} \cdot {\bf p}_r}
\end{displaymath} (B.6)

and using (B.5) with $s=r+1$ we obtain
\begin{displaymath}
{\bf g}_{r+1} = {\bf g}_r + {\alpha}_r {\cal G} \cdot {\bf p}_r .
\end{displaymath} (B.7)

A set of search directions $\{ {\bf p}_r \}$ are said to be conjugate directions (with respect to ${\cal G}$) if they satisfy the condition

\begin{displaymath}
{\bf p}_r \cdot {\cal G} \cdot {\bf p}_s = 0, \qquad r \not= s .
\end{displaymath} (B.8)

These directions are linearly independent which can be proved by reductio ad absurdum: assume the directions are linearly dependent i.e. there is a set of numbers $\{ \lambda_i \}$, not all vanishing, for which $\sum_i \lambda_i {\bf p}_i = 0$. But operating on both sides by ${\cal G}$ and taking the scalar product with ${\bf p}_j$ implies $\lambda_j {\bf p}_j \cdot {\cal G} \cdot {\bf p}_j = 0$ for all $j$ by (B.8), and ${\bf p}_j \cdot {\cal G} \cdot {\bf p}_j \not= 0$ since ${\cal G}$ is positive definite, and so we obtain the contradiction that $\lambda_j = 0$ for all $j$.

We can construct a set of these directions from a set of linearly dependent directions $\{ {\bf u}_r \}$ using a procedure analogous to Gram-Schmidt orthogonalisation i.e.

$\displaystyle {\bf p}_1$ $\textstyle =$ $\displaystyle {\bf u}_1$  
$\displaystyle {\bf p}_{r+1}$ $\textstyle =$ $\displaystyle {\bf u}_{r+1} + \sum_{i=1}^r \beta_i^{(r)} {\bf p}_i$ (B.9)

where
\begin{displaymath}
\beta_i^{(r)} = - \frac{{\bf u}_{r+1} \cdot {\cal G} \cdot {\bf p}_i}
{{\bf p}_i \cdot {\cal G} \cdot {\bf p}_i}
\end{displaymath} (B.10)

which we prove by induction. Assuming that we have $r$ conjugate directions obeying (B.8) and construct ${\bf p}_{r+1}$ according to (B.9) then
$\displaystyle {\bf p}_s \cdot {\cal G} \cdot {\bf p}_{r+1}$ $\textstyle =$ $\displaystyle {\bf p}_s \cdot {\cal G} \cdot {\bf u}_{r+1} -
\sum_{i=1}^r \frac...
...f p}_i \cdot {\cal G} \cdot {\bf p}_i} {\bf p}_s \cdot {\cal G} \cdot {\bf p}_i$  
  $\textstyle =$ $\displaystyle {\bf p}_s \cdot {\cal G} \cdot {\bf u}_{r+1} - \frac{{\bf u}_{r+1...
..._s \cdot {\cal G} \cdot {\bf p}_s} {\bf p}_s \cdot {\cal G} \cdot {\bf p}_s = 0$ (B.11)

for $s < r+1$, since ${\cal G}$ is symmetric. Now ${\bf p}_1$ and ${\bf p}_2$ are trivially verified to be conjugate directions and so the proof is complete.

It follows that any other vector may be written as a combination of these conjugate directions, in particular the vector from the initial point ${\bf x}_1$ to the minimum ${\bf x}^{\ast}$ in an $n$-dimensional space is

$\displaystyle {\bf x}^{\ast} - {\bf x}_1$ $\textstyle =$ $\displaystyle \sum_{r=1}^n {\alpha}_r {\bf p}_r$  
$\displaystyle {\alpha}_r$ $\textstyle =$ $\displaystyle \frac{{\bf p}_r \cdot {\cal G} \cdot ({\bf x}^{\ast} - {\bf x}_1)}
{{\bf p}_r \cdot {\cal G} \cdot {\bf p}_r}$  
  $\textstyle =$ $\displaystyle - \frac{{\bf p}_r \cdot {\bf g}_1}{{\bf p}_r \cdot {\cal G} \cdot {\bf p}_r}$ (B.12)

from equation B.2 and the fact that the gradient vanishes at the minimum i.e. ${\bf g}^{\ast} = {\cal G} \cdot {\bf x}^{\ast} - {\bf b} = 0$. We note therefore that ${\bf x}^{\ast}$ can be reached in $k \leq n$ steps from ${\bf x}_1$ where the $r$-th step is given by
\begin{displaymath}
{\bf x}_{r+1} = {\bf x}_r + {\alpha}_r {\bf p}_r
\end{displaymath} (B.13)

with ${\alpha}_r$ given by (B.12). Applying (B.5) to
\begin{displaymath}
{\bf x}_r = {\bf x}_1 + \sum_{i=1}^{r-1} {\alpha}_i {\bf p}_i ,
\end{displaymath} (B.14)

we obtain
\begin{displaymath}
{\bf g}_r = {\bf g}_1 + \sum_{i=1}^{r-1} {\alpha}_i {\cal G} \cdot {\bf p}_i .
\end{displaymath} (B.15)

When the scalar product with ${\bf p}_r$ is taken this gives ${\bf p}_r \cdot {\bf g}_1 = {\bf p}_r \cdot {\bf g}_r$. The expressions (B.6) and (B.12) are identical and so the steps from ${\bf x}_1$ to ${\bf x}^{\ast}$ proceed via points which are minima along each search direction. Taking the scalar product of (B.15) with ${\bf p}_s$ ($s < r$) instead we obtain
\begin{displaymath}
{\bf p}_s \cdot {\bf g}_r = {\bf p}_s \cdot {\bf g}_1 + {\alpha}_s {\bf p}_s \cdot
{\cal G} \cdot {\bf p}_s = 0
\end{displaymath} (B.16)

from equation B.12. Thus ${\bf g}_r$ is perpendicular to all previous search directions so that each point ${\bf x}_{r+1}$ is actually a minimum with respect to the whole subspace spanned by $\{ {\bf p}_1, {\bf p}_2 \ldots {\bf p}_r \}$ i.e. we can consider each step as removing one dimension of the space from the problem, so that the minimum of a quadratic function must always be found in a number of steps less than or equal to the dimensionality of the space.

The method of conjugate gradients uses such a set of conjugate directions $\{ {\bf p}_r \}$ based upon the steepest descent directions $\{ - {\bf g}_r \}$ at successive points. For this to be valid, we must show that the gradients are linearly independent. They are in fact orthogonal, which can be proved by induction. Starting with ${\bf p}_1 = -{\bf g}_1$ then from the minimum condition (B.4) we have ${\bf g}_2 \cdot {\bf p}_1 = -{\bf g}_2 \cdot {\bf g}_1 = 0$ so that the first two gradients are orthogonal. Then assuming that we have a set of $r$ orthogonal gradients, and conjugate directions obtained from them by (B.9). Using (B.16) we have ${\bf g}_{r+1} \cdot {\bf p}_s=0$ for $s \leq r$. But ${\bf p}_s$ is given by (B.9) as

\begin{displaymath}
{\bf p}_s = -{\bf g}_s + \sum_{i=1}^{s-1} \beta_i^{(s-1)} {\bf p}_i
\end{displaymath} (B.17)

so that
\begin{displaymath}
{\bf g}_{r+1} \cdot {\bf g}_s = - {\bf g}_{r+1} \cdot {\bf p...
..._{i=1}^{s-1} \beta_i^{(s-1)} {\bf g}_{r+1} \cdot {\bf p}_i = 0
\end{displaymath} (B.18)

and ${\bf g}_{r+1}$ is orthogonal to all the previous gradients and the proof is complete: the gradients are all mutually orthogonal and thus linearly independent so that they can be used to construct conjugate directions: the conjugate gradients.

In this case, equation B.9 becomes

\begin{displaymath}
\beta_i^{(r)} = \frac{{\bf g}_{r+1} \cdot {\cal G} \cdot {\bf p}_i}
{{\bf p}_i \cdot {\cal G} \cdot {\bf p}_i}
\end{displaymath} (B.19)

which using (B.7) can be rewritten
\begin{displaymath}
({\bf p}_i \cdot {\cal G} \cdot {\bf p}_i) \beta_i^{(r)} = \frac{1}{\alpha_i}
{\bf g}_{r+1} \cdot ({\bf g}_{i+1} - {\bf g}_i)
\end{displaymath} (B.20)

so that by the orthogonality of the gradients, $\beta_i^{(r)} = 0$ for $i < r$ and the only non-vanishing coefficient is $\beta_r^{(r)}$:
$\displaystyle \beta_r^{(r)}$ $\textstyle =$ $\displaystyle - \frac{{\bf g}_{r+1} \cdot {\bf g}_{r+1}}{{\bf p}_r \cdot {\bf g}_r}$ (B.21)
  $\textstyle =$ $\displaystyle \frac{{\bf g}_{r+1} \cdot {\bf g}_{r+1}}{{\bf g}_r \cdot {\bf g}_r}$ (B.22)

by (B.16) and (B.9).

Thus the method involves starting by searching along the direction of steepest descent, finding the minimum point along each direction and then calculating a new search direction from the new gradient and previous search direction ${\bf p}_{r+1} = -{\bf g}_{r+1} + {\beta}_r {\bf p}_r$ where $\beta_r = \beta_r^{(r)}$ is calculated from either of the expressions in (B.21, B.22) (which will differ for a general function) or from

\begin{displaymath}
\beta_r = \frac{{\bf g}_{r+1} \cdot ( {\bf g}_{r+1} - {\bf g}_r )}
{{\bf p}_r \cdot ( {\bf g}_{r+1} - {\bf g}_r )}
\end{displaymath} (B.23)

as suggested by Polak [189].

The minimum of the two-dimensional narrow valley, so problematic for the steepest descents method, is now found in just two steps by the conjugate gradients method (figure B.2).

Figure B.2: Conjugate gradients method - only two steps are required to find the minimum.
\includegraphics [height=15cm,angle=-90]{cg.eps}


next up previous contents
Next: Bibliography Up: thesis Previous: A. Bessel function identities   Contents
Peter Haynes