Releases · ggml-org/llama.cpp

12 Mar 10:38

363f8c5

b4877 Latest

Latest

sycl : variable sg_size support for mmvq kernels (#12336)

Assets 25

cudart-llama-bin-win-cu11.7-x64.zip

303 MB 2025-03-12T10:38:38Z
cudart-llama-bin-win-cu12.4-x64.zip

373 MB 2025-03-12T10:38:50Z
llama-b4877-bin-macos-arm64.zip

24.6 MB 2025-03-12T10:39:07Z
llama-b4877-bin-macos-x64.zip

26.3 MB 2025-03-12T10:39:09Z
llama-b4877-bin-ubuntu-arm64.zip

26.9 MB 2025-03-12T10:39:11Z
llama-b4877-bin-ubuntu-vulkan-x64.zip

32.9 MB 2025-03-12T10:39:13Z
llama-b4877-bin-ubuntu-x64.zip

28.5 MB 2025-03-12T10:39:15Z
llama-b4877-bin-win-avx-x64.zip

17.2 MB 2025-03-12T10:39:17Z
llama-b4877-bin-win-avx2-x64.zip

17.2 MB 2025-03-12T10:39:18Z
llama-b4877-bin-win-avx512-x64.zip

17.2 MB 2025-03-12T10:39:20Z
Source code (zip)

2025-03-12T09:57:32Z
Source code (tar.gz)

2025-03-12T09:57:32Z

12 Mar 09:57

github-actions

b4876

34c961b

b4876

CUDA/HIP: Fix fattn-vec-* when device warp size is not 32 (#12315)

When fattn-wmma was ported over to warp64 various bits that also touch fattn-vec where converted to
selectable warp size, however the fattn-vec kernels dont work with 64 wide warps for now, so we need
to avoid launching them with parameters for warp64

Assets 26

12 Mar 09:26

github-actions

b4875

7841fc7

b4875

llama : Add Gemma 3 support (+ experimental vision capability) (#12343)

* llama : Add Gemma 3 text-only support

* fix python coding style

* fix compile on ubuntu

* python: fix style

* fix ubuntu compile

* fix build on ubuntu (again)

* fix ubuntu build, finally

* clip : Experimental support for Gemma 3 vision (#12344)

* clip : Experimental support for Gemma 3 vision

* fix build

* PRId64

Assets 26

12 Mar 06:44

github-actions

b4874

bf69cfe

b4874

vulkan: fix bug in coopmat1 mul_mat_id (#12316)

* tests: run mul_mat_id with a larger N

* vulkan: fix bug in coopmat1 mul_mat_id

Assets 26

11 Mar 20:00

github-actions

b4873

10f2e81

b4873

CUDA/HIP: refractor mmqv to unify the calculation of nwarps and rows …

Assets 26

11 Mar 14:59

github-actions

b4872

ba76543

b4872

ggml-backend : fix backend search path (#12330)

* Fix backend search path

* replace .native() with '/'

* reverted .native()

Assets 26

11 Mar 12:36

github-actions

b4871

6ab2e47

b4871

metal : Cache the Metal library at the device context level (#12265)

Assets 26

11 Mar 09:08

github-actions

b4870

96e1280

b4870

clip : bring back GPU support (#12322)

* clip : bring back GPU support

* use n_gpu_layers param

* fix double free

* ggml_backend_init_by_type

* clean up

Assets 26

10 Mar 20:10

github-actions

b4869

2c9f833

b4869

mat vec double buffer (#12188)

Assets 26

10 Mar 18:10

github-actions

b4868

2513645

b4868

musa: support new arch mp_31 and update doc (#12296)

Signed-off-by: Xiaodong Ye <[email protected]>

Assets 26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b4877

b4876

b4875

b4874

b4873

b4872

b4871

b4870

b4869

b4868