-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do you load a lora model? #639
Comments
Model Field = Your base GGUF source model before any modifications. To answer your question, yes both models are required to load a LoRA. |
My issue is that I don't know if my lora fine tuning on the Mistral 7B using llama.cpp worked because only llama models are supported or I'm having an issue loading the quantized model with the lora adapter file. But I fine tuned on a quantized Mistral 7B model, which seemed to fine tune and reduce the loss with no errors. I get a bin file and gguf file when I'm done fine tuning. I have my quantized model as the 'model' and I try the gguf file the fine tuning produced and the error I get is the first but of text below, then when I try loading it with the bin file as the lora I get the second error text... First error says "bad file magic"
Second error with the bin file says, "Error: the simultaneous use of LoRAs and GPU acceleration is only supported for f16 models".
Since that said the bin file (adapter) needed a f16 model I load the f16 model, and that gives me this error,
|
I can't really provide much assistance on LoRAs as I don't really use them myself. Maybe you can try generating the composite model with llama.cpp instead? |
I did lora fine tuning over in llama.cpp and when I was done it created two gguf files and one bin file. I'm pretty sure the bin file is the lora base but what goes in the lora field and what goes in the model field? Does the original model get loaded? Do the gguf files need to have a specific name formatted?
The text was updated successfully, but these errors were encountered: