Releases: ggml-org/llama.cpp
Releases · ggml-org/llama.cpp
b4877
b4876
CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315) When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need to avoid launching them with parameters for warp64
b4875
llama : Add Gemma 3 support (+ experimental vision capability) (#12343) * llama : Add Gemma 3 text-only support * fix python coding style * fix compile on ubuntu * python: fix style * fix ubuntu compile * fix build on ubuntu (again) * fix ubuntu build, finally * clip : Experimental support for Gemma 3 vision (#12344) * clip : Experimental support for Gemma 3 vision * fix build * PRId64
b4874
vulkan: fix bug in coopmat1 mul_mat_id (#12316) * tests: run mul_mat_id with a larger N * vulkan: fix bug in coopmat1 mul_mat_id
b4873
CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows …
b4872
ggml-backend : fix backend search path (#12330) * Fix backend search path * replace .native() with '/' * reverted .native()
b4871
metal : Cache the Metal library at the device context level (#12265)
b4870
clip : bring back GPU support (#12322) * clip : bring back GPU support * use n_gpu_layers param * fix double free * ggml_backend_init_by_type * clean up
b4869
mat vec double buffer (#12188)
b4868
musa: support new arch mp_31 and update doc (#12296) Signed-off-by: Xiaodong Ye <[email protected]>