Julia + OpenBLAS vs. MATLAB + MKL - Matrix Operations Benchmark #1090

RoyiAvital · 2017-02-09T14:37:18Z

Hi,

I did some tests with MATLAB and Julia:

Matlab & Julia Matrix Operations Benchmark

I think they (At least to some part) reflect OpenBLAS vs. Intel MKL.
Hence I think they might be information worth knowing for the developers.

See also here:

Benchmark MATLAB & Julia for Matrix Operations

Thank You.

martin-frbg · 2017-02-09T15:40:25Z

Thanks for the pointer. Unfortunately it is not quite clear what you are testing here if you are pitting two "teams" against each other - how much of the difference in efficiency comes from each component ?
At first glance, the divergence of the graphs after a matrix size of ~1000 primarily suggests that MATLAB+MKL is using threading to its advantage while Julia+OpenBLAS is not. Whether that is due
to limitations in Julia or OpenBLAS (how did you build either, did you check if/how many threads they use) is at best unclear, though inferior performance compared to MKL in some functions/circumstances has been noted in the past e.g. #530,#532.
I suspect it would be more instructive to run benchmarks on isolated BLAS/LAPACK functions with both MKL and OpenBLAS first - unfortunately none of the current developers appears to have access to MKL.

RoyiAvital · 2017-02-09T15:48:07Z

@martin-frbg , No pitting at all.

Just thought to show data if it helps the developer from the point of view that seeing the numbers might tell where to invest effort.

You raise interesting point about Multi Threading.
Hence I rechecked and it seems all my cores are being utilized (6 Cores).
So it is not that MT is disabled under Julia.

I can tell it seems the Eigen, Cholesky and SVD decomposition are a weak point of OpenBLAS compared to MKL. Are you aware of that?

Thank You.

martin-frbg · 2017-02-09T18:08:52Z

I am certainly aware of #1077 (SVD, I suspect we will need a reduced testcase for that one to investigate further) and I think we also have issues involving *syrk (used in Cholesky) on at least some platforms. There is certainly room for improvement...

brada4 · 2017-02-10T14:55:34Z

Can you produce graphs in julia-mkl vs julia-openblas? Thay look so proportionat that could boil down to microtiming specialties in each..

RoyiAvital · 2017-02-10T15:06:41Z

@brada4 , I wish I could.
I don't have access to Julia + MKL (I'm on Windows, not going to hack my way for Julia + MKL).

brada4 · 2017-02-10T21:52:30Z

Windows octave w openblas vs matlab?

RoyiAvital · 2017-02-13T16:44:39Z

Here you have comparison of Julia + OpenBLAS vs. Julia + MKL on the same tests:

http://imgur.com/a/rBOo8

Those made by:

JuliaLang/julia#18374 (comment)

Thank You.

brada4 · 2017-02-13T17:43:05Z

There is #843 post-0.2.19 adding optimizing fortran flags to lapack, which should align graphs better.

RoyiAvital · 2017-02-17T13:44:26Z

Another place to look at (Julia + MKL vs. Julia + OpenBLAS):

https://discourse.julialang.org/t/benchmark-matlab-julia-for-matrix-operations/2000/92

https://github.com/barche/julia-blas-benchmarks/blob/master/BenchmarkResults.ipynb

Thank You.

brada4 · 2017-02-17T13:54:31Z

'matrix generation' does not involve any BLAS, it just measures your libc rng speed and malloc at various times. Probably fastest measure ran first on same system.
'reductions' shows wrong threading threshold in OpenBLAS

RoyiAvital · 2017-02-20T15:30:29Z

I found another test:

https://www.numbercrunch.de/blog/2016/03/boosting-numpy-with-mkl/

brada4 · 2017-02-20T17:51:18Z

It lacks anchor to OpenBLAS version. Obviously it cannot be with #843 fixed at that time.

martin-frbg · 2017-02-20T23:02:14Z

Early march 2016 would mean 0.2.15 or at best 0.2.16rc1 but I guess the point is the availability of the benchmark code and MKL result. (Not that much changed performance-wise for that one function on Haswell I think, might be interesting to see how much restoration of compiler optimization level for the lapack functions as per #843 actually buys us here but I doubt it is enough to close the gap. I do not have an ultrabook haswell as used for the test however ) WRT the "matrix generation" test mentioned above, no harm in having those numbers as well - at the very least it shows that OpenBLAS is not doing something fundamentally wrong in the way it stores and handles matrices.

brada4 · 2017-02-20T23:19:15Z

from numpy eigh document

The eigenvalues/eigenvectors are computed using LAPACK routines _syevd, _heevd

martin-frbg · 2017-02-20T23:26:42Z

Well that one is rather obvious. But no matter how silly that lapack handbrake thing was in retrospect, I am not so optimistic to assume that just bringing LAPACK back to its normal speed would allow "our" optimized BLAS calls to show so much gain as to actually match the MKL data.

aminya · 2019-06-28T19:33:23Z

I updated this repository adding benchmark for Julia+OpenBlas and Julia+Intel MKL
https://github.com/aminya/MatlabJuliaMatrixOperationsBenchmark

Julia+Intel MKL is faster then openBlas64 most of the time.

brada4 · 2019-06-29T04:47:29Z

Not sure if 100Hz clock of matlab counts as bad performance, could you time more iterations to get past that?

aminya · 2019-06-29T05:37:53Z

Not sure if the 100Hz clock of Matlab counts as bad performance, could you time more iterations to get past that?

Not much difference in most of the functions by running 50 times instead 4 around timeit, however, some differences in some cases.
I used a number of iterations around timeit, but the timeit itself calls the function multiple times and returns the median, then I calculate the average of different returns.

For example, inside timeit for matrix inversion, it runs 11*100 iterations and median is returned, and my 4 iterations wrap that to calculate the average of every 1100 iterations' median. If I make the number of iterations around it 50, it becomes like 55000 iterations! while I explicitly stated Julia's sample number as 700 in the code.

In a real-world situation, someone doesn't run a function like 200 times! So repeatability and stability in performance are also important.

RoyiAvital · 2019-06-29T11:41:03Z

I think, since MATLAB's Function Handler isn't as efficient as it should be, you shouldn't use the timeit() function.

This is the reason I didn't use it, as it adds overhead.
I think it is better to use the approach I used at the original test.

Update

Looking at the code of timeit() they seem to try calculating the overhead and remove it.

For more information:

For more accurate timing in MATLAB - High Accuracy Timer.

aminya · 2019-06-29T18:04:09Z

Update

Looking at the code of timeit() they seem to try calculating the overhead and remove it.

Yes, timeit() is much more accurate than simple tic and toc. timeit() uses tic and toc inside, but it is smarter to get a better benchmark, that is why it is the function that Mathworks recommends for benchmarking.
The situation for both languages is the same. We pass the handle of function to a benchmarking tool, and they calculate the time spent to run that function.

RoyiAvital · 2019-06-30T05:22:51Z

I'd still prefer direct use of tic() and toc().
The function timeit() is recommended because of the warm start and using median.

I wouldn't use timeit() on the MATLAB.
I'd just do a warm start and measure either each iteration or few iterations combined.

aminya · 2019-06-30T05:32:28Z

I'd still prefer direct use of tic() and toc().
The function timeit() is recommended because of the warm start and using median.

I wouldn't use timeit() on the MATLAB.
I'd just do a warm start and measure either each iteration or few iterations combined.

I don't understand the reason for this. inside timeit() is happening what I would do if I wanted to measure the timing, but not even that, they have thought more than me about its accuracy and various aspects of their commercial code. https://www.mathworks.com/help/matlab/matlab_prog/measure-performance-of-your-program.html
If you feel the result is biased you can send me another matlabBench file so I can run the test with your code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Julia + OpenBLAS vs. MATLAB + MKL - Matrix Operations Benchmark #1090

Julia + OpenBLAS vs. MATLAB + MKL - Matrix Operations Benchmark #1090

RoyiAvital commented Feb 9, 2017

martin-frbg commented Feb 9, 2017

RoyiAvital commented Feb 9, 2017

martin-frbg commented Feb 9, 2017

brada4 commented Feb 10, 2017

RoyiAvital commented Feb 10, 2017

brada4 commented Feb 10, 2017

RoyiAvital commented Feb 13, 2017 •

edited

Loading

brada4 commented Feb 13, 2017

RoyiAvital commented Feb 17, 2017

brada4 commented Feb 17, 2017

RoyiAvital commented Feb 20, 2017

brada4 commented Feb 20, 2017

martin-frbg commented Feb 20, 2017

brada4 commented Feb 20, 2017

martin-frbg commented Feb 20, 2017

aminya commented Jun 28, 2019 •

edited

Loading

brada4 commented Jun 29, 2019

aminya commented Jun 29, 2019 •

edited

Loading

RoyiAvital commented Jun 29, 2019 •

edited

Loading

aminya commented Jun 29, 2019 •

edited

Loading

Update

RoyiAvital commented Jun 30, 2019

aminya commented Jun 30, 2019 •

edited

Loading

Julia + OpenBLAS vs. MATLAB + MKL - Matrix Operations Benchmark #1090

Julia + OpenBLAS vs. MATLAB + MKL - Matrix Operations Benchmark #1090

Comments

RoyiAvital commented Feb 9, 2017

martin-frbg commented Feb 9, 2017

RoyiAvital commented Feb 9, 2017

martin-frbg commented Feb 9, 2017

brada4 commented Feb 10, 2017

RoyiAvital commented Feb 10, 2017

brada4 commented Feb 10, 2017

RoyiAvital commented Feb 13, 2017 • edited Loading

brada4 commented Feb 13, 2017

RoyiAvital commented Feb 17, 2017

brada4 commented Feb 17, 2017

RoyiAvital commented Feb 20, 2017

brada4 commented Feb 20, 2017

martin-frbg commented Feb 20, 2017

brada4 commented Feb 20, 2017

martin-frbg commented Feb 20, 2017

aminya commented Jun 28, 2019 • edited Loading

brada4 commented Jun 29, 2019

aminya commented Jun 29, 2019 • edited Loading

RoyiAvital commented Jun 29, 2019 • edited Loading

Update

aminya commented Jun 29, 2019 • edited Loading

Update

RoyiAvital commented Jun 30, 2019

aminya commented Jun 30, 2019 • edited Loading

RoyiAvital commented Feb 13, 2017 •

edited

Loading

aminya commented Jun 28, 2019 •

edited

Loading

aminya commented Jun 29, 2019 •

edited

Loading

RoyiAvital commented Jun 29, 2019 •

edited

Loading

aminya commented Jun 29, 2019 •

edited

Loading

aminya commented Jun 30, 2019 •

edited

Loading