In Fig. 4 we plot the results for two different FFT grid sizes: and . The implementation of each method used the same communication library call. The results plotted were obtained by averaging the communication times for 50 convolutions (i.e. both a forward and backward FFT) on the grid and 100 convolutions on the grid. The error bars show the resulting standard deviations. The dashed and dotted lines show the best fits to Eqs. 4 and 5 respectively.

The quality of the fits of Eqs. 4 and 5 to the data supports our analysis. In particular, the crossover occurs at a smaller number of processors for the case of the smaller FFT grid, as expected. On the machine used here, the new method is outperformed by the old method for the reasons given above, and would only be applicable for very small FFT grids or very large numbers (over 1000) of nodes. However, from the quality of the fit to the analysis of Sec. 5, we would expect to be able to apply Eqs. 4 and 5 with confidence to estimate how the methods would scale on other machines. For example, on the cluster of PCs mentioned in Sec. 5, on which we measured a latency of about and a bandwidth of , the expected crossover points are 40 and 140 nodes for the and grids respectively.