.TH mflops 1 .SH NAME mflops - reports average MFLOPS achieved by a program .SH SYNOPSIS .B mflops [-v] [-sse] [-avx] program [arguments] .SH DESCRIPTION .B mflops uses the perf_cntr interface found in Linux in kernel versions >=2.6.31 in order to count floating point operations whilst a program runs. It is unable to cope with threaded applications, and currently only supports Intel's Core2, Atom, Nehalem and Sandy Bridge processors. .SH OPTIONS .LP .TP .B \-v also report individual counter values. .LP .TP .B \-sse ignore non-SSE FP instructions. .LP .TP .B \-avx count a vector length of two as half a vector instruction (old behaviour), rather an a whole vector instruction. .SH NOTES .LP The total number of floating point operations is calculated from various hardware counters, as is the run time in nanoseconds. .LP The vectorisation percentage reported is the number of floating point operations which were executed in packed SSE instructions divided by the total number of floating point operations. For Sandy Bridge, a 128-bit vector instruction is counted as being half vectorised if the .B \-avx option is used. .LP The time reported, and used in the calculation of MFLOPS, is the time for which the process was scheduled, not the wallclock time. .SH BUGS .LP Some integer instructions, notably integer divide and multiply, may count as being floating point instructions unless -sse is given. This is because they execute in the FP hardware. .LP Single precision floating point operations are not at all correctly counted. .LP Threaded applications have only master thread counted. Perhaps one should set the enironment variable OMP_NUM_THREADS=1. .SH AUTHOR MJR