Replies: 2 comments 10 replies
-
Hi, thanks for bringing up the discussion. I'll try to outline some of the practices that we have followed so far to accommodate different backends into BackgroundThe project started out with Note that these are naive dot product implementations, without any advanced GEMM optimizations. Later, support for OpenBLAS and other BLAS CPU libraries was added directly in At some point, the following idea for adding GPU support to
The existing backend implementations, even though mostly decoupled from the core This is a short background and overview of how we support various backends in Back to the specific questions:
In case XeTLA is something similar to BLAS, then it could be integrated straight into Note that if we decide to integrate it as a custom backend, I would like to have all the implementation contained in 1 or 2 files similar to the existing backends. It can of course include 3rd party libs (as we do with CUDA, Metal, etc.) but the
I understand the basic principle of JIT-ing, but I don't have experience with implementing this technique. If it is something that we can write in pure C and help to optimize the existing SIMD routines in |
Beta Was this translation helpful? Give feedback.
-
Add some comments might help
The whole case would be like end to end cases
|
Beta Was this translation helpful? Give feedback.
-
Hi, this is Mingfei from intel pytorch team and we want to help optimize the performance of llama.cpp on intel hardware. I need some guidelines about how to make contributions in this project:
Any opinion is welcome :) feel free to comment so that we can find the most proper manner to contribute.
Beta Was this translation helpful? Give feedback.
All reactions