OpenBLAS far slower than Accelerate on IvyBridge #533

dpo · 2015-04-07T00:20:36Z

I have an IvyBridge i7-3720QM. I installed OpenBLAS via Homebrew on OSX 10.9 (built from the develop branch ; OpenBLAS used SandyBridge as target) and I'm comparing it to Accelerate using the Tokyo Cython interface. I'm wondering why my OpenBLAS lags quite far behind Accelerate.

Here are some sample results in single precision on vectors/matrices of size 30, 100 and 1000. The vertical axis represents thousands of calls per second to BLAS 1/2/3 functions). The horizontal axis represents the various tests, and data size increases as you move to the right.

On the plot, the openblas1 curve corresponds to OPENBLAS_NUM_THREADS=1, openblas2 to two threads, and openblas4 to four threads. Double precision results are similar:

I especially care about saxpy and daxpy, but those are amont the worst in single precision:

Accel.      OpenBLAS1   OpenBLAS2   OpenBLAS4
59429.01    14770.07    14858.42    14668.15  # size 30
44589.11    13490.18    13548.12    13471.40  # size 100
 6298.42     6964.60     4263.06     2804.24  # size 1000

and in double precision:

Accel.      OpenBLAS1   OpenBLAS2   OpenBLAS4
51149.59    14451.60    14321.85    14648.56  # size 30
28559.46    25995.41     4833.26     3092.59  # size 100
 5415.06     1655.32     1627.72     3679.58  # size 1000

Increasing the number of threads typically degrades performance. For larger vectors/matrices in double precision, the BLAS libraries are pretty much on par, but not so for small vectors/matrices. In single precision, the picture is bleaker.

Full results: https://gist.github.com/fb08bd53b13728cb7e7c (ignore the numpy stuff; I'm not taking it into account in the present results).

The text was updated successfully, but these errors were encountered:

hiccup7 · 2015-04-07T14:33:56Z

@dpo , When you mention "Accelerate", are you referring to the Anaconda Accelerate package by Continuum Analytics? If so, you are actually comparing MKL BLAS to OpenBLAS. Since MKL BLAS is used on so many systems, there is more interest to fix OpenBLAS shortcomings relative to MKL BLAS than to just the Anaconda Accelerate package (from one company). I have some suggestions to help increase the chances of your feedback resulting in improvements to OpenBLAS:

Create a separate issue for each BLAS function (so each issue can be prioritized and tracked)
Include short test code in your issue with timing results (to allow reproduction)
Document your test envirornment (which you did)
Use SciPy's BLAS wrapper functions for the BLAS compiled with your Python distribution (to clarify which BLAS function is really tested).
For BLAS function argument documentation, see the Reference Manual here:
https://software.intel.com/en-us/intel-mkl-support/documentation

dpo · 2015-04-07T14:35:35Z

"Accelerate" means the OSX Accelerate framework, i.e., Apple's implementation of the BLAS.

xianyi · 2015-04-07T15:28:06Z

@dpo , we didn't optimized s/daxpy by AVX instructions. So far, these functions also use old SSE kernels.

Refs #533. added optimized saxpy- and daxpy-kernel for haswell and sandybridge

martin-frbg · 2018-04-02T21:21:39Z

Performance on OSX must have remained bad even after addition of the optimized kernels, due to the .align issue discussed much later in #1470. Closing here.

xianyi added the Feature request label Apr 7, 2015

xianyi self-assigned this Apr 7, 2015

xianyi added a commit that referenced this issue Apr 7, 2015

Merge pull request #534 from wernsaar/develop

4f680a7

Refs #533. added optimized saxpy- and daxpy-kernel for haswell and sandybridge

dpo mentioned this issue Aug 12, 2015

linuxbrew on a cluster (native blas/lapack/mpi) Linuxbrew/legacy-linuxbrew#504

Closed

tdsmith mentioned this issue Sep 14, 2015

Why use openblas? fonnesbeck/ScipySuperpack#64

Closed

martin-frbg closed this as completed Apr 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenBLAS far slower than Accelerate on IvyBridge #533

OpenBLAS far slower than Accelerate on IvyBridge #533

dpo commented Apr 7, 2015

hiccup7 commented Apr 7, 2015

dpo commented Apr 7, 2015

xianyi commented Apr 7, 2015

martin-frbg commented Apr 2, 2018

OpenBLAS far slower than Accelerate on IvyBridge #533

OpenBLAS far slower than Accelerate on IvyBridge #533

Comments

dpo commented Apr 7, 2015

hiccup7 commented Apr 7, 2015

dpo commented Apr 7, 2015

xianyi commented Apr 7, 2015

martin-frbg commented Apr 2, 2018