-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Julia + OpenBLAS vs. MATLAB + MKL - Matrix Operations Benchmark #1090
Comments
Thanks for the pointer. Unfortunately it is not quite clear what you are testing here if you are pitting two "teams" against each other - how much of the difference in efficiency comes from each component ? |
@martin-frbg , No pitting at all. Just thought to show data if it helps the developer from the point of view that seeing the numbers might tell where to invest effort. You raise interesting point about Multi Threading. I can tell it seems the Eigen, Cholesky and SVD decomposition are a weak point of OpenBLAS compared to MKL. Are you aware of that? Thank You. |
I am certainly aware of #1077 (SVD, I suspect we will need a reduced testcase for that one to investigate further) and I think we also have issues involving *syrk (used in Cholesky) on at least some platforms. There is certainly room for improvement... |
Can you produce graphs in julia-mkl vs julia-openblas? Thay look so proportionat that could boil down to microtiming specialties in each.. |
@brada4 , I wish I could. |
Windows octave w openblas vs matlab? |
Here you have comparison of Julia + OpenBLAS vs. Julia + MKL on the same tests: Those made by: JuliaLang/julia#18374 (comment) Thank You. |
There is #843 post-0.2.19 adding optimizing fortran flags to lapack, which should align graphs better. |
Another place to look at (Julia + MKL vs. Julia + OpenBLAS): https://discourse.julialang.org/t/benchmark-matlab-julia-for-matrix-operations/2000/92 https://github.com/barche/julia-blas-benchmarks/blob/master/BenchmarkResults.ipynb Thank You. |
'matrix generation' does not involve any BLAS, it just measures your libc rng speed and malloc at various times. Probably fastest measure ran first on same system. |
I found another test: https://www.numbercrunch.de/blog/2016/03/boosting-numpy-with-mkl/ |
It lacks anchor to OpenBLAS version. Obviously it cannot be with #843 fixed at that time. |
Early march 2016 would mean 0.2.15 or at best 0.2.16rc1 but I guess the point is the availability of the benchmark code and MKL result. (Not that much changed performance-wise for that one function on Haswell I think, might be interesting to see how much restoration of compiler optimization level for the lapack functions as per #843 actually buys us here but I doubt it is enough to close the gap. I do not have an ultrabook haswell as used for the test however ) WRT the "matrix generation" test mentioned above, no harm in having those numbers as well - at the very least it shows that OpenBLAS is not doing something fundamentally wrong in the way it stores and handles matrices. |
from numpy eigh document
|
Well that one is rather obvious. But no matter how silly that lapack handbrake thing was in retrospect, I am not so optimistic to assume that just bringing LAPACK back to its normal speed would allow "our" optimized BLAS calls to show so much gain as to actually match the MKL data. |
I updated this repository adding benchmark for Julia+OpenBlas and Julia+Intel MKL Julia+Intel MKL is faster then openBlas64 most of the time. |
Not sure if 100Hz clock of matlab counts as bad performance, could you time more iterations to get past that? |
Not much difference in most of the functions by running 50 times instead 4 around For example, inside timeit for matrix inversion, it runs 11*100 iterations and median is returned, and my 4 iterations wrap that to calculate the average of every 1100 iterations' median. If I make the number of iterations around it 50, it becomes like 55000 iterations! while I explicitly stated Julia's sample number as 700 in the code. In a real-world situation, someone doesn't run a function like 200 times! So repeatability and stability in performance are also important. |
I think, since MATLAB's Function Handler isn't as efficient as it should be, you shouldn't use the This is the reason I didn't use it, as it adds overhead. UpdateLooking at the code of For more information:
For more accurate timing in MATLAB - High Accuracy Timer. |
Yes, |
I'd still prefer direct use of I wouldn't use |
I don't understand the reason for this. inside |
Hi,
I did some tests with MATLAB and Julia:
Matlab & Julia Matrix Operations Benchmark
I think they (At least to some part) reflect OpenBLAS vs. Intel MKL.
Hence I think they might be information worth knowing for the developers.
See also here:
Benchmark MATLAB & Julia for Matrix Operations
Thank You.
The text was updated successfully, but these errors were encountered: