Introduction


There have been no updates to Compaq's CPML library for Alpha Linux since 2002. I downloaded the latest CPML distribution from the Alpha Tools website and examined the timestamps on the contents.

$ rpm --verbose --package --query --list cpml_ev6-5.2.0-1.alpha.rpm
drwxr-xr-x    2 root    root                0 Feb 15  2002 /usr/doc/cpml-5.2.0
-rwxr-xr-x    1 root    root             9142 Feb 15  2002 /usr/doc/cpml-5.2.0/README
-rwxr-xr-x    1 root    root              519 Feb 15  2002 /usr/doc/cpml-5.2.0/Release_Notes-5.2.0
lrwxrwxrwx    1 root    root               31 Feb 15  2002 /usr/include/cpml.h -> ../lib/compaq/cpml-5.2.0/cpml.h
drwxr-xr-x    2 root    root                0 Feb 15  2002 /usr/lib/compaq/cpml-5.2.0
-rwxr-xr-x    1 root    root             5000 Apr 17  2000 /usr/lib/compaq/cpml-5.2.0/cpml.h
-rw-r--r--    1 root    root          1250518 Feb 11  2002 /usr/lib/compaq/cpml-5.2.0/libcpml_ev6.a
-rw-r--r--    1 root    root                0 Feb 15  2002 /usr/lib/compaq/cpml-5.2.0/libcpml_ev6.so
lrwxrwxrwx    1 root    root               33 Feb 15  2002 /usr/lib/libcpml.a -> ./compaq/cpml-5.2.0/libcpml_ev6.a
lrwxrwxrwx    1 root    root               34 Feb 15  2002 /usr/lib/libcpml.so -> ./compaq/cpml-5.2.0/libcpml_ev6.so

As you can see, the last modifications are from 2002.

The cpml distribution includes benchmark data comparing CPML to libm for some double precision maths functions. It is reasonable to assume that in the years since the last update to cpml, the glibc developers have made some progress with maths performance, I wanted to find out if libm has caught up with CPML performance.

Benchmark


These are the benchmark statistics quoted by Compaq:

subroutine libcpml libm
acos 95 1185
asin 99 1193
atan 83 242
atan2 108 408
cos 73 251
exp 67 376
log 62 299
log10 62 367
pow 118 1047
sin 74 367
sqrt 68 1017
tan 101 505

Note: these figures are in cycles, lower is better.

Note: I'm not including the F_ functions, as they use relaxed checking to improve performance, similar to gcc's -ffast-math.

If these figures are accurate, the cpml implementation of these functions offer as much as a 10 fold performance increase over the libm equivalents. I own the same revision cpu Compaq state was used to obtain these statistics, so I decided to replicate these benchmarks with an up-to-date libm.

To test this, I wrote a quick C program to test the different implementations using the same input values used by compaq, which they describe in the cpml documentation. You can download the program I used here.

Use gcc -O2 -ldl -o cpml cpml.c to compile.

NOTE: This program should work on x86 as well, you can compare the maths functions that come with icc..

In order to get as accurate results as possible, I compiled the program then changed to single user mode to minimise interference from other processes. I ran the program 3 times and took the average of the results to compile this table.

I didn't included the F_ "fast" functions, although my program does test them.

subroutine libcpml libm comment
acos 99 111 Ten fold improvement in libm since testing by compaq, now only nanoseconds slower than cpml.
asin 106 119 Another great improvement in libm.
atan 89 52.3 libm implementation now faster than cpml.
atan2 167 146 libm atan2 now out performs cpml equivalent.
cos 79 199 libm implementation improved nearly 50% since compaq testing, however cpml still the better performer.
exp 72 49 libm overtakes cpml, a dramatic 8 fold improvement.
log 80.3 162 libm almost 50% faster since compaq tests, dropping from 299 to 162 cycles.
log10 66 245 Slight improvement from libm, cpml still leader.
pow 65 212 Considerable improvement from libm, cpml still ahead. cpml pow() performed better for me than it did for compaq!
sin 92 206 Not much change since compaq tests, slight improvement from libm.
sqrt 75.6 26 libm overtakes cpml for sqrt() performance!
tan 101 198 libm tan() 5 times faster since compaq test.

A massive performance improvement in glibc makes the difference between libm and cpml negligible on some functions, and libm even overtakes on some functions, such as atan(), atan2() and sqrt().

Conclusion


If nanoseconds matter, where high performance maths are required, the cpml functions should be used selectively to allow the libm implementations that now out-perform cpml to be used. The dramatic 10 fold performance improvement initially offered by cpml no longer exists, the libm developers have improved their performance significantly.

To use the cpml functions that still offer better performance, whilst allowing libm functions to be otherwise, a preload library can be used.

  #define _GNU_SOURCE

  #include <stdio.h>
  #include <dlfcn.h>

  /* # gcc -O2 -ldl -shared -o /usr/local/lib/libcpmlloader.so
   * # echo /usr/local/lib/libcpmlloader.so >> /etc/ld.so.preload
   */

  static double (* acosptr)(double x),
                (* asinptr)(double x),
                (* cosptr)(double x),
                (* hypotptr)(double x, double y),
                (* logptr)(double x),
                (* log10ptr)(double x, double y),
                (* powptr)(double x, double y),
                (* sinptr)(double x),
                (* tanptr)(double x);

  static void __attribute__ ((constructor)) init (void)
  {
        void * library;

        if (!(library = dlopen ("libcpml.so", RTLD_LAZY))) {
                fprintf (stderr, "failed to open libcpml.so: %s\n", dlerror());
                if (!(library = dlopen ("libm.so", RTLD_LAZY))) {
                        fprintf (stderr, "attempted to fall back to libm.so, but that failed as well: %s\n",
                                        dlerror());
                        /* continue anyway, but this probably wont be good... */
                }
        }

        acosptr =       (double (*)()) dlsym (library, "acos");
        asinptr =       (double (*)()) dlsym (library, "asin");
        cosptr =        (double (*)()) dlsym (library, "cos");
        hypotptr =      (double (*)()) dlsym (library, "hypot");
        logptr =        (double (*)()) dlsym (library, "log");
        log10ptr =      (double (*)()) dlsym (library, "log10");
        powptr =        (double (*)()) dlsym (library, "pow");
        sinptr =        (double (*)()) dlsym (library, "sin");
        tanptr =        (double (*)()) dlsym (library, "tan");

        return;
  }

  double acos (double x) { return acosptr (x); }
  double asin (double x) { return asinptr (x); }
  double cos (double x) { return cosptr (x); }
  double hypot (double x, double y) { return hypotptr (x, y); }
  double log (double x) { return logptr (x); }
  double log10 (double x, double y) { return log10ptr (x, y); }
  double pow (double x, double y) { return powptr (x, y); }
  double sin (double x) { return sinptr (x); }
  double tan (double x) { return tanptr (x); }

References


  1. http://h30097.www3.hp.com/linux/compaq_cxx/readme.htm