You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a different Meta-Llama-3-8B-Instruct.Q5_K_M.llamafile which I can start the embedding passing nobrowser etc. I can also do a test to see if embedding is working via
curl -X POST http://localhost:8080/embedding \ -H "Content-Type: application/json" \ -d '{ "content": "This is a test sentence to generate embeddings for." }'
./Meta-Llama-3-8B-Instruct.Q5_K_M.llamafile --server --embedding --nobrowserextracting /zip/llama.cpp/ggml.h to /Users/harikt/.llamafile/v/0.8.9/ggml.hextracting /zip/llamafile/llamafile.h to /Users/harikt/.llamafile/v/0.8.9/llamafile.hextracting /zip/llama.cpp/ggml-impl.h to /Users/harikt/.llamafile/v/0.8.9/ggml-impl.hextracting /zip/llama.cpp/ggml-metal.h to /Users/harikt/.llamafile/v/0.8.9/ggml-metal.hextracting /zip/llama.cpp/ggml-alloc.h to /Users/harikt/.llamafile/v/0.8.9/ggml-alloc.hextracting /zip/llama.cpp/ggml-common.h to /Users/harikt/.llamafile/v/0.8.9/ggml-common.hextracting /zip/llama.cpp/ggml-quants.h to /Users/harikt/.llamafile/v/0.8.9/ggml-quants.hextracting /zip/llama.cpp/ggml-backend.h to /Users/harikt/.llamafile/v/0.8.9/ggml-backend.hextracting /zip/llama.cpp/ggml-metal.metal to /Users/harikt/.llamafile/v/0.8.9/ggml-metal.metalextracting /zip/llama.cpp/ggml-backend-impl.h to /Users/harikt/.llamafile/v/0.8.9/ggml-backend-impl.hextracting /zip/llama.cpp/ggml-metal.m to /Users/harikt/.llamafile/v/0.8.9/ggml-metal.mbuilding ggml-metal.dylib with xcode...llamafile_log_command: cc -I. -O3 -fPIC -shared -pthread -DNDEBUG -ffixed-x28 -DTARGET_OS_OSX -DGGML_MULTIPLATFORM /Users/harikt/.llamafile/v/0.8.9/ggml-metal.m -o /Users/harikt/.llamafile/v/0.8.9/ggml-metal.dylib.5snynm -framework Foundation -framework Metal -framework MetalKitIn file included from /Users/harikt/.llamafile/v/0.8.9/ggml-metal.m:3:In file included from /Users/harikt/.llamafile/v/0.8.9/ggml-metal.h:22:In file included from /Users/harikt/.llamafile/v/0.8.9/ggml.h:219:In file included from /usr/local/include/stdio.h:64:/usr/local/include/_stdio.h:93:16: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] unsigned char *_base; ^/usr/local/include/_stdio.h:93:16: note: insert '_Nullable' if the pointer may be null unsigned char *_base; ^ _Nullable/usr/local/include/_stdio.h:93:16: note: insert '_Nonnull' if the pointer should never be null unsigned char *_base; ^ _Nonnull/usr/local/include/_stdio.h:138:32: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] int (* _Nullable _read) (void *, char *, int); ^/usr/local/include/_stdio.h:138:32: note: insert '_Nullable' if the pointer may be null int (* _Nullable _read) (void *, char *, int); ^ _Nullable/usr/local/include/_stdio.h:138:32: note: insert '_Nonnull' if the pointer should never be null int (* _Nullable _read) (void *, char *, int); ^ _Nonnull/usr/local/include/_stdio.h:138:40: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] int (* _Nullable _read) (void *, char *, int); ^/usr/local/include/_stdio.h:138:40: note: insert '_Nullable' if the pointer may be null int (* _Nullable _read) (void *, char *, int); ^ _Nullable/usr/local/include/_stdio.h:138:40: note: insert '_Nonnull' if the pointer should never be null int (* _Nullable _read) (void *, char *, int); ^ _Nonnull/usr/local/include/_stdio.h:139:35: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] fpos_t (* _Nullable _seek) (void *, fpos_t, int); ^/usr/local/include/_stdio.h:139:35: note: insert '_Nullable' if the pointer may be null fpos_t (* _Nullable _seek) (void *, fpos_t, int); ^ _Nullable/usr/local/include/_stdio.h:139:35: note: insert '_Nonnull' if the pointer should never be null fpos_t (* _Nullable _seek) (void *, fpos_t, int); ^ _Nonnull/usr/local/include/_stdio.h:140:32: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] int (* _Nullable _write)(void *, const char *, int); ^/usr/local/include/_stdio.h:140:32: note: insert '_Nullable' if the pointer may be null int (* _Nullable _write)(void *, const char *, int); ^ _Nullable/usr/local/include/_stdio.h:140:32: note: insert '_Nonnull' if the pointer should never be null int (* _Nullable _write)(void *, const char *, int); ^ _Nonnull/usr/local/include/_stdio.h:140:46: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] int (* _Nullable _write)(void *, const char *, int); ^/usr/local/include/_stdio.h:140:46: note: insert '_Nullable' if the pointer may be null int (* _Nullable _write)(void *, const char *, int); ^ _Nullable/usr/local/include/_stdio.h:140:46: note: insert '_Nonnull' if the pointer should never be null int (* _Nullable _write)(void *, const char *, int); ^ _Nonnull/usr/local/include/_stdio.h:144:18: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] struct __sFILEX *_extra; /* additions to FILE to not break ABI */ ^/usr/local/include/_stdio.h:144:18: note: insert '_Nullable' if the pointer may be null struct __sFILEX *_extra; /* additions to FILE to not break ABI */ ^ _Nullable/usr/local/include/_stdio.h:144:18: note: insert '_Nonnull' if the pointer should never be null struct __sFILEX *_extra; /* additions to FILE to not break ABI */ ^ _NonnullIn file included from /Users/harikt/.llamafile/v/0.8.9/ggml-metal.m:3:In file included from /Users/harikt/.llamafile/v/0.8.9/ggml-metal.h:22:In file included from /Users/harikt/.llamafile/v/0.8.9/ggml.h:219:/usr/local/include/stdio.h:67:13: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness]extern FILE *__stdinp; ^/usr/local/include/stdio.h:67:13: note: insert '_Nullable' if the pointer may be nullextern FILE *__stdinp; ^ _Nullable/usr/local/include/stdio.h:67:13: note: insert '_Nonnull' if the pointer should never be nullextern FILE *__stdinp; ^ _Nonnull/usr/local/include/stdio.h:395:41: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] int (* _Nullable)(void *, const char *, int), ^/usr/local/include/stdio.h:395:41: note: insert '_Nullable' if the pointer may be null int (* _Nullable)(void *, const char *, int), ^ _Nullable/usr/local/include/stdio.h:395:41: note: insert '_Nonnull' if the pointer should never be null int (* _Nullable)(void *, const char *, int), ^ _Nonnull/usr/local/include/stdio.h:395:55: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] int (* _Nullable)(void *, const char *, int), ^/usr/local/include/stdio.h:395:55: note: insert '_Nullable' if the pointer may be null int (* _Nullable)(void *, const char *, int), ^ _Nullable/usr/local/include/stdio.h:395:55: note: insert '_Nonnull' if the pointer should never be null int (* _Nullable)(void *, const char *, int), ^ _Nonnull/usr/local/include/stdio.h:396:44: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] fpos_t (* _Nullable)(void *, fpos_t, int), ^/usr/local/include/stdio.h:396:44: note: insert '_Nullable' if the pointer may be null fpos_t (* _Nullable)(void *, fpos_t, int), ^ _Nullable/usr/local/include/stdio.h:396:44: note: insert '_Nonnull' if the pointer should never be null fpos_t (* _Nullable)(void *, fpos_t, int), ^ _Nonnull/usr/local/include/stdio.h:397:41: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness] int (* _Nullable)(void *)); ^/usr/local/include/stdio.h:397:41: note: insert '_Nullable' if the pointer may be null int (* _Nullable)(void *)); ^ _Nullable/usr/local/include/stdio.h:397:41: note: insert '_Nonnull' if the pointer should never be null int (* _Nullable)(void *)); ^ _Nonnull/usr/local/include/stdio.h:393:6: warning: pointer is missing a nullability type specifier (_Nonnull, _Nullable, or _Null_unspecified) [-Wnullability-completeness]FILE *funopen(const void *, ^/usr/local/include/stdio.h:393:6: note: insert '_Nullable' if the pointer may be nullFILE *funopen(const void *, ^ _Nullable/usr/local/include/stdio.h:393:6: note: insert '_Nonnull' if the pointer should never be nullFILE *funopen(const void *, ^ _Nonnull13 warnings generated.Apple Metal GPU support successfully loaded{"build":1500,"commit":"a30b324","function":"server_cli","level":"INFO","line":2869,"msg":"build info","tid":"34364426944","timestamp":1740806526}{"function":"server_cli","level":"INFO","line":2872,"msg":"system info","n_threads":4,"n_threads_batch":-1,"system_info":"AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | ","tid":"34364426944","timestamp":1740806526,"total_threads":8}llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from Meta-Llama-3-8B-Instruct.Q5_K_M.gguf (version GGUF V3 (latest))llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.llama_model_loader: - kv 0: general.architecture str = llamallama_model_loader: - kv 1: llama.block_count u32 = 32llama_model_loader: - kv 2: llama.context_length u32 = 8192llama_model_loader: - kv 3: llama.embedding_length u32 = 4096llama_model_loader: - kv 4: llama.feed_forward_length u32 = 14336llama_model_loader: - kv 5: llama.attention.head_count u32 = 32llama_model_loader: - kv 6: llama.attention.head_count_kv u32 = 8llama_model_loader: - kv 7: llama.rope.freq_base f32 = 500000.000000llama_model_loader: - kv 8: llama.attention.layer_norm_rms_epsilon f32 = 0.000010llama_model_loader: - kv 9: general.file_type u32 = 17llama_model_loader: - kv 10: llama.vocab_size u32 = 128256llama_model_loader: - kv 11: llama.rope.dimension_count u32 = 128llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2llama_model_loader: - kv 13: tokenizer.ggml.pre str = llama-bpellama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...llama_model_loader: - kv 16: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 128000llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 128009llama_model_loader: - kv 19: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...llama_model_loader: - kv 20: general.quantization_version u32 = 2llama_model_loader: - type f32: 65 tensorsllama_model_loader: - type q5_K: 193 tensorsllama_model_loader: - type q6_K: 33 tensorsllm_load_vocab: special tokens definition check successful ( 256/128256 ).llm_load_print_meta: format = GGUF V3 (latest)llm_load_print_meta: arch = llamallm_load_print_meta: vocab type = BPEllm_load_print_meta: n_vocab = 128256llm_load_print_meta: n_merges = 280147llm_load_print_meta: n_ctx_train = 8192llm_load_print_meta: n_embd = 4096llm_load_print_meta: n_head = 32llm_load_print_meta: n_head_kv = 8llm_load_print_meta: n_layer = 32llm_load_print_meta: n_rot = 128llm_load_print_meta: n_swa = 0llm_load_print_meta: n_embd_head_k = 128llm_load_print_meta: n_embd_head_v = 128llm_load_print_meta: n_gqa = 4llm_load_print_meta: n_embd_k_gqa = 1024llm_load_print_meta: n_embd_v_gqa = 1024llm_load_print_meta: f_norm_eps = 0.0e+00llm_load_print_meta: f_norm_rms_eps = 1.0e-05llm_load_print_meta: f_clamp_kqv = 0.0e+00llm_load_print_meta: f_max_alibi_bias = 0.0e+00llm_load_print_meta: f_logit_scale = 0.0e+00llm_load_print_meta: n_ff = 14336llm_load_print_meta: n_expert = 0llm_load_print_meta: n_expert_used = 0llm_load_print_meta: causal attn = 1llm_load_print_meta: pooling type = 0llm_load_print_meta: rope type = 0llm_load_print_meta: rope scaling = linearllm_load_print_meta: freq_base_train = 500000.0llm_load_print_meta: freq_scale_train = 1llm_load_print_meta: n_yarn_orig_ctx = 8192llm_load_print_meta: rope_finetuned = unknownllm_load_print_meta: ssm_d_conv = 0llm_load_print_meta: ssm_d_inner = 0llm_load_print_meta: ssm_d_state = 0llm_load_print_meta: ssm_dt_rank = 0llm_load_print_meta: model type = 8Bllm_load_print_meta: model ftype = Q5_K - Mediumllm_load_print_meta: model params = 8.03 Bllm_load_print_meta: model size = 5.33 GiB (5.70 BPW)llm_load_print_meta: general.name = n/allm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'llm_load_print_meta: EOS token = 128009 '<|eot_id|>'llm_load_print_meta: LF token = 128 'Ä'llm_load_print_meta: EOT token = 128009 '<|eot_id|>'llm_load_tensors: ggml ctx size = 0.34 MiBggml_backend_metal_log_allocated_size: allocated buffer, size = 5115.48 MiB, ( 5115.55 / 10922.67)llm_load_tensors: offloading 32 repeating layers to GPUllm_load_tensors: offloaded 32/33 layers to GPUllm_load_tensors: Metal buffer size = 5115.48 MiBllm_load_tensors: CPU buffer size = 5459.93 MiB........................................................................................llama_new_context_with_model: n_ctx = 2048llama_new_context_with_model: n_batch = 2048llama_new_context_with_model: n_ubatch = 512llama_new_context_with_model: flash_attn = 0llama_new_context_with_model: freq_base = 500000.0llama_new_context_with_model: freq_scale = 1ggml_metal_init: allocatingggml_metal_init: found device: Apple M1ggml_metal_init: picking default device: Apple M1ggml_metal_init: default.metallib not found, loading from sourceggml_metal_init: GGML_METAL_PATH_RESOURCES = nilggml_metal_init: loading '/Users/harikt/.llamafile/v/0.8.9/ggml-metal.metal'ggml_metal_init: GPU name: Apple M1ggml_metal_init: GPU family: MTLGPUFamilyApple7 (1007)ggml_metal_init: GPU family: MTLGPUFamilyCommon3 (3003)ggml_metal_init: GPU family: MTLGPUFamilyMetal3 (5001)ggml_metal_init: simdgroup reduction support = trueggml_metal_init: simdgroup matrix mul. support = trueggml_metal_init: hasUnifiedMemory = trueggml_metal_init: recommendedMaxWorkingSetSize = 11453.25 MBllama_kv_cache_init: Metal KV buffer size = 256.00 MiBllama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiBllama_new_context_with_model: CPU output buffer size = 0.50 MiBllama_new_context_with_model: Metal compute buffer size = 164.00 MiBllama_new_context_with_model: CPU compute buffer size = 258.50 MiBllama_new_context_with_model: graph nodes = 1030llama_new_context_with_model: graph splits = 3{"function":"initialize","level":"INFO","line":489,"msg":"initializing slots","n_slots":1,"tid":"34364426944","timestamp":1740806532}{"function":"initialize","level":"INFO","line":498,"msg":"new slot","n_ctx_slot":2048,"slot_id":0,"tid":"34364426944","timestamp":1740806532}{"function":"server_cli","level":"INFO","line":3090,"msg":"model loaded","tid":"34364426944","timestamp":1740806532}llama server listening at http://127.0.0.1:8080
One question is does all llamafile have embedding support or some of them don't? ( I know by default embedding is turned off and only will work if we pass --embedding. Asking if we pass --embedding to any llamafile model does it work or not. ) I noticed many of the models are crashing at my end.
Version
Meta-Llama-3-8B-Instruct.Q5_K_M.llamafile is llamafile v0.8.9 is working fine.
Llama-3.2-1B-Instruct.Q6_K.llamafile is llamafile v0.9.0 : Has bug.
What operating system are you seeing the problem on?
Mac
Relevant log output
I am using M1. If needed can provide any more details.
The text was updated successfully, but these errors were encountered:
What happened?
Without passing --nobrowser
I have a different Meta-Llama-3-8B-Instruct.Q5_K_M.llamafile which I can start the embedding passing nobrowser etc. I can also do a test to see if embedding is working via
One question is does all llamafile have embedding support or some of them don't? ( I know by default embedding is turned off and only will work if we pass --embedding. Asking if we pass --embedding to any llamafile model does it work or not. ) I noticed many of the models are crashing at my end.
Version
Meta-Llama-3-8B-Instruct.Q5_K_M.llamafile is llamafile v0.8.9 is working fine.
Llama-3.2-1B-Instruct.Q6_K.llamafile is llamafile v0.9.0 : Has bug.
What operating system are you seeing the problem on?
Mac
Relevant log output
I am using M1. If needed can provide any more details.
The text was updated successfully, but these errors were encountered: