B. Conjugate gradients

By contrast, the method of conjugate gradients [187] takes
information from previous steps into account, but only requires that the
previous search direction (rather than all previous search directions as might
be expected) be stored. For a full description see [188].
We consider a general quadratic scalar function of a
number of variables, written as a vector :

We label the search directions . In the steepest descents method, and we move along this direction an amount described by the parameter until at the end (at ) the search direction is perpendicular to the gradient :

which proves that consecutive search directions are always mutually perpendicular in this method. Now, for defined by equation B.1 we have

The minimum of along the direction is at and is still given by condition (B.4), so that is given by

and using (B.5) with we obtain

A set of search directions
are said to be conjugate
directions (with respect to ) if they satisfy the condition

These directions are linearly independent which can be proved by
* reductio ad absurdum*: assume the directions are linearly
dependent i.e. there is a set of numbers
, not all vanishing,
for which
. But operating on both sides by
and taking the scalar product with implies
for all by (B.8),
and
since is positive
definite, and so we obtain the contradiction that for all .

We can construct a set of these directions from a set of linearly
dependent directions
using a procedure analogous to
Gram-Schmidt orthogonalisation i.e.

where

(B.10) |

(B.11) |

for , since is symmetric. Now and are trivially verified to be conjugate directions and so the proof is complete.

It follows that any other vector may be written as a combination of these
conjugate directions, in particular the vector from the initial point
to the minimum
in an -dimensional space is

from equation B.2 and the fact that the gradient vanishes at the minimum i.e. . We note therefore that can be reached in steps from where the -th step is given by

(B.13) |

(B.14) |

When the scalar product with is taken this gives . The expressions (B.6) and (B.12) are identical and so the steps from to proceed via points which are minima along each search direction. Taking the scalar product of (B.15) with () instead we obtain

from equation B.12. Thus is perpendicular to all previous search directions so that each point is actually a minimum with respect to the whole subspace spanned by i.e. we can consider each step as removing one dimension of the space from the problem, so that the minimum of a quadratic function must always be found in a number of steps less than or equal to the dimensionality of the space.

The method of conjugate gradients uses such a set of conjugate directions
based upon the steepest descent directions
at successive points. For this to be valid, we must show that the gradients
are linearly independent. They are in fact orthogonal, which can be proved by
induction. Starting with
then from the minimum condition (B.4)
we have
so that the
first two gradients are orthogonal. Then assuming that we have a set of
orthogonal gradients, and conjugate directions obtained from them by
(B.9). Using (B.16) we have
for . But is given by (B.9) as

(B.17) |

(B.18) |

In this case, equation B.9 becomes

(B.19) |

(B.20) |

by (B.16) and (B.9).

Thus the method involves starting by searching along the direction of
steepest descent, finding the minimum point along each direction and then
calculating a new search direction from the new gradient and previous
search direction
where
is calculated from either of the
expressions in (B.21, B.22)
(which * will* differ for a general function) or from

(B.23) |

The minimum of the two-dimensional narrow valley, so problematic for the steepest descents method, is now found in just two steps by the conjugate gradients method (figure B.2).