Several types of linear-scaling scheme exist and a point of
commonality between many of them is the use of localized
functions. Some of these approaches
use a relatively small basis set of numerical atomic
orbitals [9] or gaussian atomic
orbitals [10,11] that have been pre-optimized
for other environments and transferred to the system under
consideration;
other
approaches [12,13,14,15,16]
use much larger localized basis sets of simple
functions such as polynomials [17,18],
spherical waves [19], or bandwidth limited
delta functions [20].
Each of these philosophies has its advantages and drawbacks: the
former can suffer from transferability problems but is capable of
providing good accuracy with modest effort; the latter is
computationally more intensive but is capable of giving an
accuracy that is *systematically* tunable with a parameter that
controls the completeness of the basis set that is being used, akin to
the kinetic energy cut-off in plane wave methods. It is this latter
category of method that we discuss here.

The usefulness of any linear-scaling scheme is ultimately
determined by its *crossover* point, namely the system size at
which the method begins to be faster than conventional cubic-scaling
approaches.
This crossover depends largely on two factors: First the
computational cost per iteration per atom, and second the number of
iterations required to reach a given convergence threshold per
atom. Even if a method is constructed in which the computational cost
per iteration per atom is small
and independent of system size, the number of iterations required may
be so large that the minimization is prohibitively
inefficient. Indeed, it has been observed that
methods which use large basis sets suffer
from this very problem,
known as *ill-conditioning*. We present a discussion of the
origin of ill-conditioning and describe a general scheme to overcome it.

We briefly outline the formalism of linear-scaling methods in Section 2. In Section 3 we discuss the cause of the above-mentioned ill-conditioning, and in Section 4, following the work of Bowler and Gillan [21], we present a general preconditioning scheme for alleviating the problem. In particular, we show that the ``diagonal approximation'' that was invoked in Ref. [21] is unnecessary and we account for the tensorial nature of the nonorthogonal bases correctly. In Section 5 we extend our analysis to the case of an orthogonal basis, and in Section 6 we use our linear-scaling method [16] as a specific example of the preconditioning scheme. Finally, in Section 7 we present results that demonstrate the importance of using preconditioning.