AutoGGUFEmbeddings with nomic-embed-text-v1.5.Q8_0.gguf not able to achieve high context length #14530

pwyang123 · 2025-02-28T07:34:57Z

pwyang123
Feb 28, 2025

nomic-embed-text-v1.5.Q8_0.gguf claims to reach 8192 context length, but AutoGGUFEmbeddings loaded with this model still have only 512 context length even model was setNCtx(8191) and model attribute nCtx has 8191 as value. any advice on how to config model to be able to increase context length would be appreciated, thanks.
pipeline snippet:

document_assembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")
autoGGUFModel = (
    AutoGGUFEmbeddings.loadSavedModel("path_to/nomic-embed-text-v1.5.Q8_0.gguf", spark)\
    .setInputCols(["document"]) \
    .setOutputCol("embeddings") \
    .setBatchSize(4) \
    .setNGpuLayers(99) \
    .setNCtx(8191)\
)
pipeline = Pipeline().setStages([document_assembler, autoGGUFModel])

Answered by DevinTDHa

Mar 8, 2025

Hi @pwyang123,

I was able to reproduce it and I'm working on a fix. There are some issues with the error handling and it shouldn't fail silently. I'll update this discussion, when the fix is ready. Thanks for reporting!

In the meantime, can you try the following:

autoGGUFModel = (
  AutoGGUFEmbeddings.loadSavedModel("path_to/nomic-embed-text-v1.5.Q8_0.gguf", spark)\
    .setInputCols("document")
    .setOutputCol("embeddings")
    .setBatchSize(4)
    .setNGpuLayers(99)
    .setNCtx(8192)
    .setNBatch(2048) # Set logical batch size
    .setNUbatch(2048) # Set physical batch size
)

Explanation (for reference see this discussion):

llama.cpp allows for setting of the 1. logical and 2. phys…

View full answer

maziyarpanahi · 2025-02-28T18:06:53Z

maziyarpanahi
Feb 28, 2025
Maintainer

Hi @pwyang123
How do you know the context length is 512 and not more?

3 replies

pwyang123 Mar 4, 2025
Author

@maziyarpanahi the pipeline can only handle the input texts with length < 512 and output valid embeddings, otherwise it will failed with no embeddings. Similar scenarios also encountered by other member: https://spark-nlp.slack.com/archives/CA118BWRM/p1739475865118369

DevinTDHa Mar 8, 2025
Collaborator

Hi @pwyang123,

I was able to reproduce it and I'm working on a fix. There are some issues with the error handling and it shouldn't fail silently. I'll update this discussion, when the fix is ready. Thanks for reporting!

In the meantime, can you try the following:

autoGGUFModel = (
  AutoGGUFEmbeddings.loadSavedModel("path_to/nomic-embed-text-v1.5.Q8_0.gguf", spark)\
    .setInputCols("document")
    .setOutputCol("embeddings")
    .setBatchSize(4)
    .setNGpuLayers(99)
    .setNCtx(8192)
    .setNBatch(2048) # Set logical batch size
    .setNUbatch(2048) # Set physical batch size
)

Explanation (for reference see this discussion):

llama.cpp allows for setting of the 1. logical and 2. physical batch sizes. Specifically this means that

setNBatch sets the size of the logits and embeddings buffer, which limits the maximum batch size during decoding.
- Influences how many tokens are logically processed or grouped together at the time
- this is represents the maximum number of tokens per iteration during continuous batching
setNUbatch sets the physical maximum batch size for computation
- number of tokens that is passed to the device
- Logical batches are split to fit
- can't be higher than setNBatch

After I set these two parameters it worked for me, with a text size of length 2000 and nCtx of 8192.

As a side note, due to some naming differences in llama.cpp and Spark NLP, the term batch is a bit overloaded. While in llama.cpp it means batch sizes on a token level, in Spark NLP it means the number of rows passed to the model for inference. I will write a guide explaining this difference in the near future.

Answer selected by DevinTDHa

pwyang123 Mar 9, 2025
Author

@DevinTDHa thanks for the updates and the information. this time when setting both .setNBatch and .setNUbatch as 8192, the pipeline is able to process string with length of 8k. for the string longer than that it will still silently error out. surely look forward to the formal fix. thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoGGUFEmbeddings with nomic-embed-text-v1.5.Q8_0.gguf not able to achieve high context length #14530

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

AutoGGUFEmbeddings with nomic-embed-text-v1.5.Q8_0.gguf not able to achieve high context length #14530

pwyang123 Feb 28, 2025

Replies: 1 comment · 3 replies

maziyarpanahi Feb 28, 2025 Maintainer

pwyang123 Mar 4, 2025 Author

DevinTDHa Mar 8, 2025 Collaborator

pwyang123 Mar 9, 2025 Author

pwyang123
Feb 28, 2025

Replies: 1 comment 3 replies

maziyarpanahi
Feb 28, 2025
Maintainer

pwyang123 Mar 4, 2025
Author

DevinTDHa Mar 8, 2025
Collaborator

pwyang123 Mar 9, 2025
Author