Several types of linear-scaling scheme exist and a point of commonality between many of them is the use of localized functions. Some of these approaches use a relatively small basis set of numerical atomic orbitals  or gaussian atomic orbitals [10,11] that have been pre-optimized for other environments and transferred to the system under consideration; other approaches [12,13,14,15,16] use much larger localized basis sets of simple functions such as polynomials [17,18], spherical waves , or bandwidth limited delta functions . Each of these philosophies has its advantages and drawbacks: the former can suffer from transferability problems but is capable of providing good accuracy with modest effort; the latter is computationally more intensive but is capable of giving an accuracy that is systematically tunable with a parameter that controls the completeness of the basis set that is being used, akin to the kinetic energy cut-off in plane wave methods. It is this latter category of method that we discuss here.
The usefulness of any linear-scaling scheme is ultimately determined by its crossover point, namely the system size at which the method begins to be faster than conventional cubic-scaling approaches. This crossover depends largely on two factors: First the computational cost per iteration per atom, and second the number of iterations required to reach a given convergence threshold per atom. Even if a method is constructed in which the computational cost per iteration per atom is small and independent of system size, the number of iterations required may be so large that the minimization is prohibitively inefficient. Indeed, it has been observed that methods which use large basis sets suffer from this very problem, known as ill-conditioning. We present a discussion of the origin of ill-conditioning and describe a general scheme to overcome it.
We briefly outline the formalism of linear-scaling methods in Section 2. In Section 3 we discuss the cause of the above-mentioned ill-conditioning, and in Section 4, following the work of Bowler and Gillan , we present a general preconditioning scheme for alleviating the problem. In particular, we show that the ``diagonal approximation'' that was invoked in Ref.  is unnecessary and we account for the tensorial nature of the nonorthogonal bases correctly. In Section 5 we extend our analysis to the case of an orthogonal basis, and in Section 6 we use our linear-scaling method  as a specific example of the preconditioning scheme. Finally, in Section 7 we present results that demonstrate the importance of using preconditioning.