sync : ggml #2868

ggerganov · 2025-03-08T08:27:41Z

No description provided.

It is used by Whisper talk-llama example. Co-authored-by: Petter Reinholdtsen <[email protected]>

* Support fp16 unary operations in the CUDA backend * cpu: increase fp16 support for unary operators in the CPU backend * cuda: increase fp16 support for unary operators in the CUDA backend * Add test cases for fp16 unary operators * metal: update supports_op for unary operators that don't support fp16, to prevent test-backend-ops from failing * metal: fix PR comments for unary op support after fp16 unary tests

…s_op (ggml/1129) * cuda: restrict SILU_BACK to fp32, since fp16 exceeds the desired test threshold * vulkan: specify fp32-only support for certain ops (that are now tested for fp16 as well) * f32 sigmoid in vulkan supports op * Revert "f32 sigmoid in vulkan supports op" This reverts commit c6f04b3c19bf4504c2776149c6d8cd84e0b48acb.

* ggml-cpu: Fix build with sve Signed-off-by: Molly Sophia <[email protected]> * ggml-cpu: Remove unused variable in sve q3_k vec dot Signed-off-by: Molly Sophia <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]>

Looks like a copy/paste bug from qx_needs_dequant.

* Fix dependencies between ggml and backends ggml backends link only to ggml-base and ggml links to all backends. * Fix installation of ggml backends Set up GNUInstallDirs before setting the installation directory of ggml backends

* vulkan: improve im2col performance

* faster dequant for old quants * dont use unpack for iq4_nl * vec2 unpack for q8

Remove unused header file that causes compilation failure on ARM platform with GCC 13.

…12064) * Added SVE Support for Q2_K Quantized Models * Use 4-space indentation in the switch cases * removed comments lines * Remove the loop Retain the curly bracess for better understanding of code * Remove the comment like added for q3_k_q8_k kernel --------- Co-authored-by: vithulep <[email protected]>

…ns (llama/11595) * vulkan: implement specialized MMV kernels for IQ2 quantizations * vulkan: add MMV kernels for IQ3 quants * vulkan: Increase MMV batch size and unroll IQ LUT setup * vulkan: fix init_iq_shmem for WG sizes larger than tables * vulkan: common batch size for all I-quants

* Upgrade init_tensor API to return a ggml_status To prepare for an 'abort-free' ggml (ggml not to abort on OOMs but return a OOM status), as agreeed with Diego in the ggml repo, upgrade the init_tensor() and view_init() APIs to return a ggml_status. * misc fixes --------- Co-authored-by: slaren <[email protected]>

cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".

…12144)

… (llama/12133) * SYCL: refactor and move cpy kernels to a separate file * Add few missing cpy kernels * refactor and add debug logs

The libggml API has changed, but this has not been updated.

…12032) Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check Adds rocWMMA support to fattn-wmma-f16

* Add include files for std::min/max and std::toupper/tolower * win32: move _USE_MATH_DEFINES before includes to ensure M_PI is defined * Use GGML_RESTRICT instead of "restrict" keyword everywhere, and use "__restrict" in MSVC plain C mode * win32: only use __restrict in MSVC if C11/C17 support is not enabled --------- Co-authored-by: Marcus Groeber <[email protected]>

ggml-ci

…1118) * ggml_compute_forward_concat() for arbitrary tensor type * Check that tensors' type match * ggml-cpu.c: check type of source tensors * ggml-cpu.c: move tensor type check to ggml_compute_forward_concat() * ggml.c: check concatenated tensor type * Remove tensor type check from ggml_compute_forward_concat() in ggml-cpu.c ..., as it was moved to ggml.c.

-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.

…201)

…ma/12154) * ggml-cpu: Faster IQ1 mul_mat_vec on AVX2 using BMI2 instructions * cmake: Add GGML_BMI2 build option * ggml: enable BMI2 on relevant CPU variants * ggml-cpu: include BMI2 in backend score * ggml-cpu: register BMI2 in ggml_backend_cpu_get_features * ggml-cpu: add __BMI2__ define when using MSVC

Co-authored-by: ubuntu <[email protected]>

…/12174) ... which left garbage bits in the upper half of the kernel args. This caused segmentation faults when running PoCL.

Fix the following error: ``` ggml-alloc.c:99: not enough space in the buffer ggml_tallocr_alloc: not enough space in the buffer to allocate blk.17.ffn_down.weight (needed 27525120, available 27521024) ``` which occurs when `ggml_backend_opencl_context::alignment` is larger than `cl_ptr_base` (hard-coded to `0x1000`). Also, fix `ggml_backend_opencl_context::alignment` was set to `CL_DEVICE_MEM_BASE_ADDR_ALIGN` which was treated as bytes but the value is reported in bits.

…replaceing it. (llama/12209) This avoids conflict with internal cuda/hip runtimes memory managment behavior.

…12092) (llama/12094) Signed-off-by: Ray Lee <[email protected]> Co-authored-by: Ray Lee <[email protected]>

… (llama/12217) * opencl: support noncontiguous `norm` * opencl: support noncontiguous `rms_norm` * opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`

This commit updates the custom command to build the default.metallib file to use the correct path to ../ggml-common.h by using the variable METALLIB_COMMON. The motivation for this change is that currently when building and specifying GGML_METAL_EMBED_LIBRARY=OFF the following error is generated: ```console [ 11%] Linking CXX shared library ../../bin/libggml.dylib [ 11%] Built target ggml make[2]: *** No rule to make target `ggml/src/ggml-metal/ggml-common.h', needed by `bin/default.metallib'. Stop. make[1]: *** [ggml/src/ggml-metal/CMakeFiles/ggml-metal-lib.dir/all] Error 2 ``` With the above change the build could progress but there was a follow on error about not being able to find the ggml-common.h file in ggml-metal.metal where is was included as a relative path: ```console [ 11%] Compiling Metal kernels /Users/danbev/work/llama.cpp/build/bin/ggml-metal.metal:6:10: error: '../ggml-common.h' file not found, did you mean 'ggml-common.h'? ^~~~~~~~~~~~~~~~~~ "ggml-common.h" 1 error generated. ``` Removing the relative path then allowed the build to complete successfully.

…2194) * metal : refactor im2col parameters into a struct * metal: Change im2col offset types from int32_t to uint64_t to support larger memory offsets * metal : refactor sum_rows parameters into a struct * metal : refactor soft_max parameters into a struct * metal : refactor diag_mask_inf parameters into a struct * metal : refactor ssm_conv parameters into a struct * metal : refactor ssm_scan parameters into a struct * metal : refactor get_rows parameters into a struct * metal : refactor group_norm parameters into a struct * metal : refactor conv_transpose_1d parameters into a struct * metal : refactor upscale parameters into a struct * metal : refactor pad parameters into a struct * metal : refactor pad_reflect_1d parameters into a struct * metal : refactor arange parameters into a struct * metal : refactor timestep_embedding parameters into a struct * metal : refactor argsort parameters into a struct * metal : refactor leaky_relu parameters into a struct * metal : refactor pool_2d parameters into a struct * metal : fix trailing whitespace --------- Co-authored-by: alexju <[email protected]>

petterreinholdtsen and others added 30 commits March 8, 2025 10:25

Told cmake to install ggml-cpp.h as a public header file. (ggml/1126)

e1df78f

It is used by Whisper talk-llama example. Co-authored-by: Petter Reinholdtsen <[email protected]>

cuda: unary ops as float + de-duplicate (ggml/1130)

0c02f62

ggml-cpu: Fix build with sve (llama/12059)

0894863

* ggml-cpu: Fix build with sve Signed-off-by: Molly Sophia <[email protected]> * ggml-cpu: Remove unused variable in sve q3_k vec dot Signed-off-by: Molly Sophia <[email protected]> --------- Signed-off-by: Molly Sophia <[email protected]>

vulkan: fix assertion when qy_needs_dequant (llama/12068)

efe4017

Looks like a copy/paste bug from qx_needs_dequant.

vulkan: improve im2col (llama/11826)

c270cea

* vulkan: improve im2col performance

vulkan: matmul dequantization improvements (llama/12015)

cd143b8

* faster dequant for old quants * dont use unpack for iq4_nl * vec2 unpack for q8

CANN: Fix build error with GCC 13 (llama/11990)

a252113

Remove unused header file that causes compilation failure on ARM platform with GCC 13.

CUDA: fix logic for V100 + GGML_CUDA_FORCE_MMQ (llama/12098)

00b059e

CUDA: compress mode option and default to size (llama/12029)

85e0461

cuda 12.8 added the option to specify stronger compression for binaries, so we now default to "size".

ggml-backend : keep paths in native string type when possible (llama/…

aa27e01

…12144)

SYCL: Move CPY kernels to a separate file and add few missing kernels…

5167979

… (llama/12133) * SYCL: refactor and move cpy kernels to a separate file * Add few missing cpy kernels * refactor and add debug logs

ggml : fix kleidiai build (llama/12159)

7914670

The libggml API has changed, but this has not been updated.

HIP: implement FlashAttention via rocWMMA for CDNA and RDNA3+ (llama/…

9a517d6

…12032) Adds GGML_HIP_ROCWMMA_FATTN and rocwmma header check Adds rocWMMA support to fattn-wmma-f16

vulkan : sync (llama/0)

66e42fa

ggml-ci

ggml : fix GGMLMetalClass ODR (llama/12200)

6979ba8

-- it might happen if ggml is loaded from 2 separate libraries since each one of them will expose the class. This is more of a guard since we want to use only Metal as embedded library and don't care about the other case.

SYCL: Disable f16 Unary OPs as not supported by the kernels (llama/12…

c19d8eb

…201)

opencl : fix profile-related errors (llama/12095)

dccfba4

Co-authored-by: ubuntu <[email protected]>

opencl : fix ulong kernel args were set from int variables (llama…

03b31a0

…/12174) ... which left garbage bits in the upper half of the kernel args. This caused segmentation faults when running PoCL.

HIP/CUDA: set the paramerter value in maintain_cuda_graph instead of …

8a3a93e

…replaceing it. (llama/12209) This avoids conflict with internal cuda/hip runtimes memory managment behavior.

CUDA: fix FA logic for PTX 7.0 and CC >= 7.5 (llama/12222)

425cefd

hbuxiaofei and others added 7 commits March 8, 2025 10:25

cmake : fix undefined reference errors for std::filesystem in ggml (#…

f755166

…12092) (llama/12094) Signed-off-by: Ray Lee <[email protected]> Co-authored-by: Ray Lee <[email protected]>

opencl: Noncontiguous norm, rms_norm, disable fp16 for some ops…

dbf9384

… (llama/12217) * opencl: support noncontiguous `norm` * opencl: support noncontiguous `rms_norm` * opencl: disable fp16 for `ADD`, `MUL`, `SCALE`, `RELU`, `GELU`, `SILU`, `CLAMP`

ggml-cpu: faster AVX2 variant for IQ1_M (llama/12216)

16754cd

sync : ggml

0b4956d

cmake : fix ggml-config (ggml/0)

ed5c494

ggerganov mentioned this pull request Mar 8, 2025

whisper-stream exits on startup using Vulkan on Windows with AMD 7900 XT #2867

Open

objc : fix build, tmp remove GPU support, use C++17

209e1f3

ggerganov force-pushed the sync-ggml-25-03-08 branch from a102b03 to 209e1f3 Compare March 8, 2025 09:01

ggerganov merged commit 7d14005 into master Mar 8, 2025
44 checks passed

ggerganov deleted the sync-ggml-25-03-08 branch March 8, 2025 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync : ggml #2868

sync : ggml #2868

ggerganov commented Mar 8, 2025

sync : ggml #2868

sync : ggml #2868

Conversation

ggerganov commented Mar 8, 2025