Releases · ggml-org/llama.cpp

09 Jul 12:11

9925ca4

b3353

cmake : allow external ggml (#8370)

Assets 20

08 Jul 19:32

github-actions

b3347

2ec846d

b3347

sycl : fix powf call in device code (#8368)

Assets 20

08 Jul 10:52

github-actions

b3345

2ee44c9

b3345

sync : ggml

ggml-ci

Assets 20

08 Jul 09:05

github-actions

b3342

470939d

b3342

common : preallocate sampling token data vector (#8363)

`emplace_back` repeatedly-called is slower than preallocating the vector to the vocab size and directly inserting the data. Some rudimentary profiling with `chrono` improves the performance of this block of code from ~500us/op to ~40us/op.

Overall, this slightly improves the sampling performance which has a more substantial impact for the `examples/lookahead` implementation -- I am able to see a ~10% performance boost in lookahead inference.

Assets 20

08 Jul 09:01

github-actions

b3341

6f0dbf6

b3341

infill : assert prefix/suffix tokens + remove old space logic (#8351)

Assets 20

08 Jul 08:02

github-actions

b3340

ffd0079

b3340

common : avoid unnecessary logits fetch (#8358)

Assets 20

07 Jul 15:03

github-actions

b3334

f7cab35

b3334

gguf-hash: model wide and per tensor hashing using xxhash and sha1 (#…

Assets 20

07 Jul 14:39

github-actions

b3333

905942a

b3333

llama : support glm3 and glm4 (#8031)

* add chatglm3-6b model support huggingface model:
 https://hf-mirror.com/THUDM/chatglm3-6b

Signed-off-by: XingXing Qiao <[email protected]>

* remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model

Signed-off-by: XingXing Qiao <[email protected]>

* fix lint error

Signed-off-by: XingXing Qiao <[email protected]>

* optimize convert-hf-to-gguf.py for chatglm model

Signed-off-by: XingXing Qiao <[email protected]>

* support glm-4-9b-chat

Signed-off-by: XingXing Qiao <[email protected]>

* fix eos tokens to glm4

* remove unused log

* add preprocess to chatglm3 and chatglm4

* add eos_id_list to llama.cpp

* fix code style

* fix code style

* fix conflicts

* fix conflicts

* Revert "add eos_id_list to llama.cpp"

This reverts commit 3a4d5790bfdc205c5b658204239f168fc21cc1a8.

* set <|endoftext|> as eos and <|user|> as eot

* fix chat template bug

* add comment to glm prefix and suffix

* fix conflicts and add rope_ratio & ChatGLMForConditionalGeneration

* fix chat template bug

* fix codestyle

* fix conflicts

* modified the general name of glm model

* fix conflicts

* remove prefix and suffix

* use normal glm4 chattempalte & use LLM_FFN_SWIGLU in phi3

* fix: resolve Flake8 errors in `convert-hf-to-gguf.py`

- Fix E302 by adding two blank lines before top-level function definitions
- Replace print statements to fix NP100
- Fix E303 by ensuring only one blank line between lines of code

* fix rope ratio to solve incorrect answers

* fix by comments

---------

Signed-off-by: XingXing Qiao <[email protected]>
Co-authored-by: XingXing Qiao <[email protected]>
Co-authored-by: Umpire2018 <[email protected]>

Assets 20

07 Jul 14:38

github-actions

b3332

b504008

b3332

llama : fix n_rot default (#8348)

ggml-ci

Assets 20

07 Jul 09:55

github-actions

b3328

cb4d86c

b3328

server: Retrieve prompt template in /props (#8337)

* server: Retrieve prompt template in /props

This PR adds the following:
- Expose the model's Jinja2 prompt template from the model in the /props endpoint.
- Change log-level from Error to Warning for warning about template mismatch.

The front-end stands a better chance of actually executing the Jinja template format correctly. Server is currently just guessing it.

Ideally this should have been inside a JSON block that expose the same key/value pairs as listed during startup in "llm_load_print_meta" function.

* Make string buffer dynamic

* Add doc and better string handling

* Using chat_template naming convention

* Use intermediate vector for string assignment

Assets 20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ggml-org/llama.cpp

b3353

b3347

b3345

b3342

b3341

b3340

b3334

b3333

b3332

b3328