Releases · ggml-org/llama.cpp

10 Jul 16:05

f4444d9

b3367

[SYCL] Use multi_ptr to clean up deprecated warnings (#8256)

Assets 20

10 Jul 15:09

github-actions

b3366

6b2a849

b3366

ggml : move sgemm sources to llamafile subfolder (#8394)

ggml-ci

Assets 20

10 Jul 15:09

github-actions

b3365

0f1a39f

b3365

ggml : add AArch64 optimized GEMV and GEMM Q4 kernels (#5780)

* Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add optimized GEMV and GEMM asm kernels for q4_0_q8_0 quantization and refactor code to address llama.cpp pr#5780 suggestions

* Arm AArch64: add copyright claim only to ggml-aarch64.cpp and ggml-aarch64.h files

* Arm AArch64: minor code refactoring for rebase

* Arm AArch64: minor code refactoring for resolving a build issue with cmake

* Arm AArch64: minor code refactoring to split the Q4_0_AARC64 type into three separate types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code change for resolving a build issue with server-windows

* retrigger checks

* Arm AArch64: minor code changes for rebase

* Arm AArch64: minor changes to skip the pr#7433 vec_dot code for arm cpus with SVE VL not equal to 256 bits

* Arm AArch64: remove stale LLAMA_QKK_64 from CMakeLists.txt and delete build.zig

* Arm AArch64: add reference scalar gemm and gemv, and avoid dynamic memory allocations during quantization for Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: add multithreaded quantization support for the new types: Q4_0_4_4, Q4_0_4_8, and Q4_0_8_8

* Arm AArch64: minor code refactoring

* Arm AArch64: simplify logic for calling gemm and gemv functions in ggml_compute_forward_mul_mat

* Arm AArch64: minimize changes in ggml_compute_forward_mul_mat

* Arm AArch64: minor code refactoring, and add reference scalar code to quantize routines for new quant types

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* Arm AArch64: minor code refactoring

* rebase on the latest master commit 3fd62a6 and adapt to the new directory structure

* Arm AArch64: remove a redundant comment

* Arm AArch64: add pragma in ggml-aarch64.c to turn -Woverlength-strings warning off

* Arm AArch64: use __aarch64__ check to guard 64-bit neon kernels

* Arm AArch64: update docs/build.md README to include compile time flags for buiilding the Q4_0_4_4 quant type

Assets 20

10 Jul 14:54

github-actions

b3363

cc61948

b3363

llama : C++20 compatibility for u8 strings (#8408)

Assets 20

10 Jul 14:54

github-actions

b3362

7a80710

b3362

msvc : silence codecvt c++17 deprecation warnings (#8395)

Assets 20

10 Jul 14:43

github-actions

b3361

a8be1e6

b3361

llama : add assert about missing llama_encode() call (#8400)

Co-authored-by: Stanisław Szymczyk <[email protected]>

Assets 20

09 Jul 23:20

github-actions

b3358

a59f8fd

b3358

Server: Enable setting default sampling parameters via command-line (…

Assets 20

09 Jul 17:09

github-actions

b3356

e500d61

b3356

Deprecation warning to assist with migration to new binary names (#8283)

* Adding a simple program to provide a deprecation warning that can exist to help people notice the binary name change from #7809 and migrate to the new filenames.

* Build legacy replacement binaries only if they already exist. Check for their existence every time so that they are not ignored.

Assets 19

09 Jul 17:03

github-actions

b3355

a03e8dd

b3355

make/cmake: LLAMA_NO_CCACHE -> GGML_NO_CCACHE (#8392)

Assets 20

09 Jul 15:26

github-actions

b3354

5b0b8d8

b3354

sycl : Reenabled mmvq path for the SYCL Nvidia Backend (#8372)

* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend

* Reduced verbosity of comment

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b3367

b3366

b3365

b3363

b3362

b3361

b3358

b3356

b3355

b3354