next up previous contents
Next: 4.5 Requirements for linear-scaling Up: 4. Density-Matrix Formulation Previous: 4.3 Density-matrix DFT   Contents


4.4 Constraints on the density-matrix

4.4.1 Trace

From the definitions so far, the trace of the density-matrix is defined to be
2 {\rm Tr}(\rho) = 2 \int {\mathrm d}{\bf r}~\rho({\bf r},{\bf r}) =
\int {\mathrm d}{\bf r}~ n({\bf r}) = N .
\end{displaymath} (4.26)

This constraint may be applied explicitly, which is a simple matter given that this is a linear constraint, or we may prefer to make the Legendre transform to the zero-temperature grand canonical ensemble and work at fixed chemical potential and variable electron number, minimising the grand potential $\Omega = E - \mu N$ rather than the total energy $E$. For insulators, it is sufficient for the chemical potential $\mu$ to be between the energies of the highest occupied and lowest unoccupied states.

4.4.2 Idempotency

The self-consistent ground-state density-matrix must display the property of idempotency i.e. $\rho^2 = \rho$. Unless the eigenvalues of the density-matrix (occupation numbers) remain in the interval $[0,1]$ the density-matrix will follow unphysical ``run-away'' solutions. Unfortunately it is not possible to work directly with the eigenvalues of the density-matrix4.1 to constrain them to lie in this interval and together with the non-linearity of the idempotency condition, this constraint turns out to be the major problem to be tackled. We briefly outline three ways in which this constraint can be dealt with. The first two of these are related and all three are described in [134] in the context of Hartree-Fock calculations.

If we are considering $m$ orbitals in our scheme, then the density-matrix in the representation of those orbitals, or of a linear combination of those orbitals, is an Hermitian $m \times m$ matrix of rank $n = \textstyle{1 \over 2}N$. The factorisation property of idempotent density-matrices is that an idempotent matrix $P$ may always be written

P = T T^{\dag }
\end{displaymath} (4.27)

where T is an $m \times n$ matrix whose columns are orthonormal i.e.
T^{\dag } T = 1_n
\end{displaymath} (4.28)

in which $1_n$ denotes the $n \times n$ identity matrix.

Any Hermitian matrix can be diagonalised by some unitary matrix $U$ such that the diagonal matrix ${\tilde P}$ is

{\tilde P} = U^{\dag } P U .
\end{displaymath} (4.29)

As already observed, the property $P^2 = P$ implies ${\tilde P}^2 = {\tilde P}$ so that each diagonal element (eigenvalue) is zero or one. The rank of $P$ is unchanged by the unitary transformation so that ${\tilde P}$ has $n$ 1's and $(m-n)$ 0's on its diagonal. In this case,
P = U {\tilde P} U^{\dag } = U {\tilde P}^2 U^{\dag } =
U {\tilde P} U^{\dag } U {\tilde P} U^{\dag } = T T^{\dag }
\end{displaymath} (4.30)

in which $T$ is an $m \times n$ rectangular matrix whose $n$ columns are selected from those of $U$. These columns possess the required orthonormality from the unitary property of $U$ and the proof of the factorisation property is complete.

We also note that expressing the density-matrix in this way guarantees that it is positive semi-definite

{\tilde P}_{ij} = \sum_k (U^{\dag } T)_{ik} (U^{\dag } T)^{\dag }_{kj} =
\sum_k (U^{\dag } T)_{ik} (U^{\dag } T)_{jk}^{\ast}
\end{displaymath} (4.31)

so that the eigenvalues are (no summation convention)
{\tilde P}_{ii} = \sum_k \left\vert (U^{\dag } T)_{ik} \right\vert^2 \geq 0 .
\end{displaymath} (4.32)

4.4.3 Penalty functional

Consider a matrix $R_0$ which is not idempotent i.e. $R_0^2 \not= R_0$. To make it so, we need to reduce the matrix $(R_0^2 - R_0)$ to zero, which can be achieved by minimising the (positive semi-definite) scalar quantity ${\rm Tr}\left[(R_0^2 - R_0)^2\right]$, whose minimum value is zero, with respect to the individual elements. Since

\frac{\partial {\rm Tr}\left[(R_0^2 - R_0)^2\right]}{\partial (R_0)_{ij}} =
\left[ 2R_0(R_0-1)(2R_0-1)\right]_{ji} ,
\end{displaymath} (4.33)

this can be achieved by using the right-hand side of this equation as a search direction in a steepest descents or conjugate gradients scheme. This results in a rapidly convergent (second order) sequence $R_0$, $R_1$, $R_2$ etc. which in the steepest descents method is defined by
R_{n+1} = R_n^2 (3-2R_n) .
\end{displaymath} (4.34)

The limit $R_{\infty}$ is a strictly idempotent matrix close to $R_0$ in the sense that the separation
{\rm Tr}\left[(R_{\infty}-R_0)^2\right] \ll {\rm Tr}(R_{\infty}) = n .
\end{displaymath} (4.35)

Kohn [135] has suggested the use of the square-root of this function as a penalty functional for the density-matrix:

{\cal P}[\rho] = \left[ \int {\mathrm d}{\bf r} \left( \rho^2 (1-\rho)^2
\right)({\bf r},{\bf r}) \right]^{1 \over 2}
\end{displaymath} (4.36)

and has proved that the minimum of the functional

\begin{displaymath}2 {\rm Tr}(\rho H_{\mathrm{KS}}) - \mu N + \alpha {\cal P}[\rho] \end{displaymath}

equals the ground-state grand potential (i.e. ${\cal P}[\rho] = 0$) for some $\alpha > \alpha_{\mathrm c}(\{\varepsilon_i\};\mu)$. In particular,
\alpha_{\mathrm c} > 2 \left[ \sum_{i,\varepsilon_i \leq \mu}
(\varepsilon_i - \mu)^2 \right]^{1 \over 2}
\end{displaymath} (4.37)

although he does not prove that this is a lower bound. A practical scheme would increase the value of $\alpha $ until the minimum of the functional occurred for ${\cal P}[\rho] = 0$, and this is discussed more fully in section 6.1.

4.4.4 Purifying transformation

We consider the result of one steepest descent step i.e. one iteration of equation 4.34 which allows us to write the density-matrix $\rho$ in terms of an auxiliary matrix $\sigma$ as

\rho = 3 \sigma^2 - 2 \sigma^3 .
\end{displaymath} (4.38)

The second order convergence is exhibited by
\rho^2 - \rho = 4 (\sigma^2 - \sigma)^3 - 3 (\sigma^2 - \sigma)^2
\end{displaymath} (4.39)

so that if $\sigma$ is a nearly idempotent matrix (in a manner to be defined below), then $\rho$ constructed from $\sigma$ by (4.38) is a more nearly idempotent matrix, with leading error second-order in the error of $\sigma$.

In the common diagonal representation of $\rho$ and $\sigma$ this relationship can be expressed in terms of the individual eigenvalues $\lambda_{\rho}$ and $\lambda_{\sigma}$:

\lambda_{\rho} = 3 \lambda_{\sigma}^2 - 2 \lambda_{\sigma}^3 .
\end{displaymath} (4.40)

Figure 4.1: Behaviour of eigenvalues under the purifying transformation.

\put(28,0){\includegraphics [width=8cm]{mcweeny.eps}}

Thus as long as all of the eigenvalues of $\sigma$ lie in the interval $-{1 \over 2} \leq \lambda_{\sigma} \leq {3 \over 2}$ the eigenvalues of $\rho$ will lie in the interval $0 \leq \lambda_{\rho} \leq 1$ as required. If any of the eigenvalues of $\sigma$ lie outside the interval $[ {1 - \surd{5} \over 2},{1 + \surd{5} \over 2}]$, then $\rho$ as constructed by (4.38) will be less idempotent than $\sigma$, and this defines the meaning of ``nearly idempotent'' for $\sigma$. Run-away solutions are still possible when the purifying transformation is used to construct $\rho$, but at least there is now a metastable minimum at the ground-state, and variation of $\sigma$ will implicitly drive $\rho$ to idempotency.

4.4.5 Idempotency-preserving variations

Finally we consider the most general change which an idempotent $m \times m$ matrix of rank $n$ can suffer, while maintaining that idempotency. Using the factorisation property we write $R = T T^{\dag }$ and consider changes in $T$ i.e. $T \rightarrow T + \delta T$, where, without loss of generality,

\delta T = \Delta T
\end{displaymath} (4.41)

in which $\Delta$ is an arbitrary non-singular $m \times m$ matrix. Now the density-matrix $R$ is a projection operator associated with an $n$-dimensional vector subspace ($\Gamma_R$) spanned by the columns of $T$. Any vector ${\bf x}$ may of course be uniquely decomposed into its components lying in the subspace $\Gamma_R$ and in the complementary subspace $\Gamma_{1-R}$:
{\bf x} = {\bf x}_R + {\bf x}_{1-R}
\end{displaymath} (4.42)

$\displaystyle {\bf x}_R$ $\textstyle =$ $\displaystyle R {\bf x} ,$ (4.43)
$\displaystyle {\bf x}_{1-R}$ $\textstyle =$ $\displaystyle (1-R) {\bf x} .$ (4.44)

To define a new matrix $R$ it is sufficient to define a new $n$-dimensional subspace. Since any vector can be decomposed according to equation 4.42, including the columns of $T$, any new vector (of arbitrary length) can be formed by adding a vector lying completely outside $\Gamma_R$. $n$ arbitrary linearly-independent vectors of this kind are given by the columns of

\delta T = (1-R) \Delta T
\end{displaymath} (4.45)

in which the action of $(1-R)$ is to project out the part of $\Delta T$ lying in $\Gamma_R$. So $T + \delta T$ with $\delta T$ defined by (4.45) defines a new subspace in a completely general way. However the columns of $T' = T + \delta T$ are no longer orthonormal and so it is necessary to orthonormalise the columns of $T'$ to obtain a new set ${\tilde T}$ which defines the new projection operator
R + \delta R = {\tilde T} {\tilde T}^{\dag } .
\end{displaymath} (4.46)

The metric associated with the vectors of $T'$ is the $m \times m$ matrix $M = {T'}^{\dag } T'$ and so a convenient orthonormalisation is

{\tilde T} = T' M^{-{1 \over 2}} .
\end{displaymath} (4.47)

Defining $v = (1-R) \Delta R$ we note the following relations, following from $T^{\dag } T = 1$ and $T T^{\dag } = R$.
$\displaystyle T^{\dag } R$ $\textstyle =$ $\displaystyle T^{\dag } (T T^{\dag }) = (T^{\dag } T) T^{\dag } = T^{\dag }$ (4.48)
$\displaystyle R T$ $\textstyle =$ $\displaystyle T$ (4.49)
$\displaystyle T^{\dag } v$ $\textstyle =$ $\displaystyle T^{\dag } (1-R) \Delta R = 0$ (4.50)
$\displaystyle v^{\dag } T$ $\textstyle =$ $\displaystyle 0$ (4.51)

Thus $R + \delta R$
  $\textstyle =$ $\displaystyle {\tilde T} {\tilde T}^{\dag } = T' M^{-{1 \over 2}}
( T' M^{-{1 \...
...2}} )^{\dag } = T' M^{-1} {T'}^{\dag } =
T' ({T'}^{\dag } T')^{-1} {T'}^{\dag }$  
  $\textstyle =$ $\displaystyle \left[ T + (1-R) \Delta T \right]\left[ (T + (1-R) \Delta T)^{\dag }
( T + (1-R) \Delta T) \right]^{-1}
\left[ T + (1-R) \Delta T \right]^{\dag }$  
  $\textstyle =$ $\displaystyle (R + v) T \left[ T^{\dag } (1 + v^{\dag } )(1 + v) T \right]^{-1} T^{\dag }
(R + v^{\dag })$  
  $\textstyle =$ $\displaystyle (R + v) \left[ 1 + v^{\dag } v \right]^{-1} (R + v^{\dag }) .$ (4.52)

When $\Delta$ represents a small change, then a convergent expansion for the inverse matrix in equation 4.52 can be used
\left[ 1 + v^{\dag } v \right]^{-1} = \sum_{n=0}^{\infty} (-1)^n (v^{\dag } v)^n
\end{displaymath} (4.53)

to write down $\delta R$ to any order
\delta R = (v + v^{\dag }) + (v v^{\dag } - v^{\dag } v) + \ldots
\end{displaymath} (4.54)

By taking the expansion to first order only, we make the change $\delta R$ linear in $\Delta$ (which has certain advantages e.g. in implementing conjugate gradients) and see that this does indeed maintain idempotency to first order:

$\displaystyle \delta R$ $\textstyle =$ $\displaystyle v + v^{\dag }
= (1-R) \Delta R + R \Delta^{\dag } (1-R) ,$ (4.55)
$\displaystyle (R+\delta R)^2 - (R+\delta R)$ $\textstyle =$ $\displaystyle (R^2 - R) + R \delta R + (\delta R) R +
(\delta R)^2 - \delta R$  
  $\textstyle =$ $\displaystyle R(1-R) \Delta R + R^2 \Delta^{\dag }(1-R) + (1-R) \Delta R^2 + R \Delta(1-R)R$  
    $\displaystyle \quad - (1-R) \Delta R - R \Delta^{\dag } (1-R) + (\delta R)^2$  
  $\textstyle =$ $\displaystyle (R^2 - R) \Delta^{\dag }(1-R) +(1-R) \Delta (R^2-R) + (\delta R)^2$  
  $\textstyle =$ $\displaystyle (\delta R)^2$ (4.57)

which vanishes to first order in $\delta R$ as required. Thus if we have the ground-state density-matrix and consider making variations consistent with idempotency to first order as described here, then the energy must increase, and again some stability against the run-away solutions is obtained.
next up previous contents
Next: 4.5 Requirements for linear-scaling Up: 4. Density-Matrix Formulation Previous: 4.3 Density-matrix DFT   Contents
Peter Haynes