The argument for performing DMC calculations on a parallel computer is even more compelling than for VMC calculations, because DMC calculations require approximately an order of magnitude more CPU time than the equivalent VMC calculation.

As with VMC calculations, the DMC algorithm is intrinsically parallel.
In the algorithm outlined in section , an ensemble of
walkers is used to evaluate the local energy of the guiding function,
, at each time step of the simulation. In our parallel
version of the DMC algorithm, this ensemble of walkers is distributed
across all *NNODES* nodes of the parallel machine. Each node is responsible
for performing stages (2)-(8) of the algorithm (the diffusion, drift
and creation/annihilation of walkers) on its own subset of the total
ensemble of walkers.

After all the walkers have been advanced for a block of time steps, the mean energy across all the walkers on all the nodes is used to update the trial energy as in stage (11) of the algorithm.

where is the accumulated local energy of the subset of walkers on the i node.

The renormalisation of the number of walkers at the end of a block is performed across all the nodes in the following way.

- The number of `live' walkers summed across all the nodes, , at the end of the block is compared with the original number, .
- If there are too many walkers then walkers are selected at random and deleted. If there are too few walkers then walkers are chosen at random and copied.
- The walkers are the redistributed across the nodes so that there are walkers on each node at the start of the next block.

It is important to try and keep the number of walkers on each node equal, i.e. to `load balance' the algorithm efficiently. The efficiency of the algorithm at any one time step is determined by

where is the total number of walkers across all the nodes and is the number of walkers on the node which at that particular time step has the largest number of `live' walkers. The efficiency of the algorithm can therefore be improved in two ways,

- Keep the number of walkers as well `balanced' across the nodes as possible.
- Increase the average number of walkers per node, .

In other words any one node can never have more than one more walker than any other node. For the DMC calculations reported on in this thesis, there are typically 10 walkers per node, yielding an average efficiency of approximately 95%.

The parallel DMC algorithm requires a set of equilibrated configurations as an input, in the same way as the serial DMC algorithm. These configurations are produced by instructing each node in the parallel VMC algorithm to write out an equilibrated ensemble of configurations.

Tue Nov 19 17:11:34 GMT 1996