Many supercomputers rely on being able to spot when there is independence between different data structures or different parts of the same structure. This can either be because they wish to split work between multiple processors (e.g. summing a very long vector), or because they wish to move data from memory to the CPU long before it is used, and guarantee that that data would not have been overwritten unexpectedly by subsequent intstructions. There follows an example of how this is impossible to do well with C.

(This example is deliberately kept free of explicit pointer arithmetic, for the amusement of the "C without pointers is just like FORTRAN" school. It is not. This example is intended to show why automatic (pseudo-)vectorisation of C is in general unachieveable without breaking the C standard, and hence that non-standard, non-portable directives will be required on any vector (or long-pipeline, or multi-pipeline) machine to extract reasonable preformance from C in numeric codes.)

Consider a function to add two vectors storing the result in a third.

void vadd(double *a,double *b,double *c,int n) { int i; for(i=0;i<n;i++) c[i]=a[i]+b[i]; }This is something which should surely pseudo-vectorise easily. Each iteration of the loop is independent of preceeding iterations, and one can simple preload large amounts of the vectors

Not quite. Suppose the function is called as `vadd(x,x,x,100)`. This
should double the first 100 elements of the array `x`. A silly way of
doubling something, but the code the compiler produces for this function must
cope with this unusual usage. In this case, the vectorised model described
above would still cope.

Now consider doing something really odd with the `vadd` function - let
us use it to calculate the Fibonacci series:

/* Generate Fibonacci series by using "vadd" function */ #include<stdio.h> #define NMAX 50 int main() { double fib[NMAX]; int i; fib[0]=fib[1]=1.0; vadd(&fib[0],&fib[1],&fib[2],NMAX-2); for(i=0;i<NMAX;i++) printf("%2d %12.0lf\n",i+1,fib[i]); }

Now any attempt to vectorise the `vadd` function would lead to erroneous
results at this point. The way that the arguments to `vadd` have been
passed masks the horror that the loop is now:

for(i=0;i<48;i++) x[i+2]=x[i+1]+x[i];In other words, each iteration of the loop requires data calculated in the two immediately preceeding iterations. Any attempt to preload data is likely to lead to the original (uninitialised) data being used, rather than the data being written out in the preceeding iterations.

C compilers must generate code which will produce the correct
result as long as the source obeys the relevant standards. Thus, on seeing the
function `vadd` defined as above, the compiler must produce code which
will run correctly if it is called to double a vector in place, or to produce a
Fibonacci sequence.

Those unfamiliar with FORTRAN might believe that the statements above apply equally to FORTRAN, as the code appears trivial to translate into FORTRAN.

subroutine vadd(a,b,c,n) integer i,n double precision a(*),b(*),c(*) do i=1,n c(i)=a(i)+b(i) enddo endand

program fbncc integer nmax,i parameter (nmax=50) double precision fib(nmax) fib(1)=1.0d0 fib(2)=1.0d0 call vadd(fib,fib(2),fib(3),nmax-2) do i=1,nmax write(*,100) i,fib(i) enddo 100 format(X,I2,X,F12.0) end

This code will run happily on many workstations, but produces rubbish with full
optimisation on some machines. A bug in the compilers? Not necessarily --
rather bad FORTRAN. The FORTRAN standard does not permit
any of a set of overlapping arguments to be modified, so calling `vadd`
in this way breaks the FORTRAN standard. (DEC Alphas with full optimisation
produce rubbish for this code.)

Thus the stricter standard of FORTRAN allows compilers to optimise much more agressively than a C compiler can, because many more assumptions about the code and its data dependencies can be made.

MJ Rutter, August 2000. Return to contents