Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenBLAS far slower than Accelerate on IvyBridge #533

Closed
dpo opened this issue Apr 7, 2015 · 4 comments
Closed

OpenBLAS far slower than Accelerate on IvyBridge #533

dpo opened this issue Apr 7, 2015 · 4 comments
Assignees

Comments

@dpo
Copy link

dpo commented Apr 7, 2015

I have an IvyBridge i7-3720QM. I installed OpenBLAS via Homebrew on OSX 10.9 (built from the develop branch ; OpenBLAS used SandyBridge as target) and I'm comparing it to Accelerate using the Tokyo Cython interface. I'm wondering why my OpenBLAS lags quite far behind Accelerate.

Here are some sample results in single precision on vectors/matrices of size 30, 100 and 1000. The vertical axis represents thousands of calls per second to BLAS 1/2/3 functions). The horizontal axis represents the various tests, and data size increases as you move to the right.

single

On the plot, the openblas1 curve corresponds to OPENBLAS_NUM_THREADS=1, openblas2 to two threads, and openblas4 to four threads. Double precision results are similar:

double

I especially care about saxpy and daxpy, but those are amont the worst in single precision:

Accel.      OpenBLAS1   OpenBLAS2   OpenBLAS4
59429.01    14770.07    14858.42    14668.15  # size 30
44589.11    13490.18    13548.12    13471.40  # size 100
 6298.42     6964.60     4263.06     2804.24  # size 1000

and in double precision:

Accel.      OpenBLAS1   OpenBLAS2   OpenBLAS4
51149.59    14451.60    14321.85    14648.56  # size 30
28559.46    25995.41     4833.26     3092.59  # size 100
 5415.06     1655.32     1627.72     3679.58  # size 1000

Increasing the number of threads typically degrades performance. For larger vectors/matrices in double precision, the BLAS libraries are pretty much on par, but not so for small vectors/matrices. In single precision, the picture is bleaker.

Full results: https://gist.github.com/fb08bd53b13728cb7e7c (ignore the numpy stuff; I'm not taking it into account in the present results).

@hiccup7
Copy link

hiccup7 commented Apr 7, 2015

@dpo , When you mention "Accelerate", are you referring to the Anaconda Accelerate package by Continuum Analytics? If so, you are actually comparing MKL BLAS to OpenBLAS. Since MKL BLAS is used on so many systems, there is more interest to fix OpenBLAS shortcomings relative to MKL BLAS than to just the Anaconda Accelerate package (from one company). I have some suggestions to help increase the chances of your feedback resulting in improvements to OpenBLAS:

  1. Create a separate issue for each BLAS function (so each issue can be prioritized and tracked)
  2. Include short test code in your issue with timing results (to allow reproduction)
  3. Document your test envirornment (which you did)
  4. Use SciPy's BLAS wrapper functions for the BLAS compiled with your Python distribution (to clarify which BLAS function is really tested).
  5. For BLAS function argument documentation, see the Reference Manual here:
    https://software.intel.com/en-us/intel-mkl-support/documentation

@dpo
Copy link
Author

dpo commented Apr 7, 2015

"Accelerate" means the OSX Accelerate framework, i.e., Apple's implementation of the BLAS.

@xianyi
Copy link
Collaborator

xianyi commented Apr 7, 2015

@dpo , we didn't optimized s/daxpy by AVX instructions. So far, these functions also use old SSE kernels.

@xianyi xianyi self-assigned this Apr 7, 2015
xianyi added a commit that referenced this issue Apr 7, 2015
Refs #533. added optimized saxpy- and daxpy-kernel for haswell and sandybridge
@martin-frbg
Copy link
Collaborator

Performance on OSX must have remained bad even after addition of the optimized kernels, due to the .align issue discussed much later in #1470. Closing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants