Skip to content

Releases: ggml-org/llama.cpp

b4799

02 Mar 14:36
14dec0c
Compare
Choose a tag to compare
main: use jinja chat template system prompt by default (#12118)

* Use jinja chat template system prompt by default

* faster conditional order

* remove nested ternary

---------

Co-authored-by: Xuan Son Nguyen <[email protected]>

b4798

01 Mar 15:06
1782cdf
Compare
Choose a tag to compare
main: update outdated system prompt message (followup to #12131) (#12…

b4797

01 Mar 13:45
45a8e76
Compare
Choose a tag to compare
common : add --system-prompt parameter, replace behavior of -p in con…

b4796

01 Mar 12:39
80c41dd
Compare
Choose a tag to compare
CUDA: compress mode option and default to size (#12029)

cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".

b4793

28 Feb 14:35
70680c4
Compare
Choose a tag to compare
ggml : upgrade init_tensor API to return a ggml_status (#11854)

* Upgrade init_tensor API to return a ggml_status

To prepare for an 'abort-free' ggml
(ggml not to abort on OOMs but return a OOM status),
as agreeed with Diego in the ggml repo,
upgrade the init_tensor() and view_init() APIs
to return a ggml_status.

* misc fixes

---------

Co-authored-by: slaren <[email protected]>

b4792

28 Feb 12:25
c43a3e7
Compare
Choose a tag to compare
llama : add Phi-4-mini support (supersede #12099) (#12108)

* Added Phi-4-mini-instruct support

* Update regex per ngxson

* Change the vocab base to Xenova/gpt-4o

* fix conversion update script

* no need to check longrope

* minor style fix

* fix python style

---------

Co-authored-by: Nicholas Sparks <[email protected]>

b4790

28 Feb 09:32
438a839
Compare
Choose a tag to compare
vulkan: add specific MMV kernels for IQ2 and IQ3 quants + optimizatio…

b4789

28 Feb 09:00
9c42b17
Compare
Choose a tag to compare
CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (#12098)

b4788

28 Feb 08:36
05e6f5a
Compare
Choose a tag to compare
ggml: aarch64: implement SVE kernels for q2_k_q8_k vector dot (#12064)

* Added SVE Support for Q2_K Quantized Models

* Use 4-space indentation in the switch cases

* removed comments lines

* Remove the loop Retain the curly bracess for better understanding of code

* Remove the comment like added for q3_k_q8_k kernel

---------

Co-authored-by: vithulep <[email protected]>

b4786

28 Feb 08:17
fbeda90
Compare
Choose a tag to compare
vulkan: matmul dequantization improvements (#12015)

* faster dequant for old quants

* dont use unpack for iq4_nl

* vec2 unpack for q8