Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating code to match to match llamacpp tag b4689 #93

Merged
merged 52 commits into from
Mar 9, 2025

Conversation

vaiju1981
Copy link

@vaiju1981 vaiju1981 commented Feb 12, 2025

This PR copies the code from #92

@vaiju1981 vaiju1981 mentioned this pull request Feb 12, 2025
@vaiju1981
Copy link
Author

vaiju1981 commented Feb 12, 2025

Some concerns : Resolved

  1. Without setting setNPredict sometimes completion just hangs and does not return. - not anymore
  2. With newer models especially non-llama models if you give a prompt "What is 2+2?" it might return repeated answer 2+2=4. 2+2=4. 2+2=4. 2+2=4. 2+2=4. 2+2=4. 2+2=4. 2+2=4. 2+2=4. 2+2=4. 2+2=4. 2+2=4. 2+2=4. 2+2=4. -- this is the issue of not applying proper template
  3. Sometimes the close does not release the lock and just hangs, in the code you would need to have System.exit -- this was resolved by removing premature release of taskid when stream was set to true.

@vaiju1981
Copy link
Author

@kherud

Can we enable the pipeline to see if it builds normally on other architecture. I only have ability to test on mac.

@kherud
Copy link
Owner

kherud commented Feb 13, 2025

Hey, thank you very much for this! Unfortunately I won't have the time until the weekend to look at it, but I'll try to approve the pipelines until then. Don't worry if they don't run yet, though, the workflows are quite brittle.

@vaiju1981 vaiju1981 mentioned this pull request Feb 14, 2025
@kherud
Copy link
Owner

kherud commented Feb 16, 2025

So far everything looks great! I'm now trying to fix the github workflows. I previously didn't compile the shared libraries with curl support, because I didn't find an easy way to statically link libcurl. The other solution is to dynamically link it, but this requires users to having installed libcurl, which I wanted to avoid (particularly for Windows users this might cause problems). For now, we can dynamically link it though and find a solution later.

@vaiju1981
Copy link
Author

The libcurl option is mostly for my usecases. If it hard we can remove it from workflow alltogether.

@vaiju1981
Copy link
Author

@kherud were you able to check the workflow and if it working.

@kherud
Copy link
Owner

kherud commented Feb 21, 2025

Hey @vaiju1981 I did and I fixed the libcurl problems for Linux/Windows, but now there are other problems. I'll try to continue as soon as possible. Can you see the workflow results here?

@vaiju1981
Copy link
Author

I have updated the test and moved the code to match the latest llama.cpp version. can you enable the workflow, i think the only issue is with windows build ( but i don't have windows machine to test )

@vaiju1981
Copy link
Author

@kherud can you try now, I have updated code and i could verify on Mac and Unix, I don't have access to windows so I can't test on windows.

@vaiju1981
Copy link
Author

vaiju1981 commented Mar 6, 2025

I am able to get ubuntu and mac-os pass but widows is failing i think the issue might be with architecture identification:
Window error: java.lang.UnsatisfiedLinkError: No native library found for os.name=Windows, os.arch=x86_64,

Rest passed
Run for the build:
Screenshot 2025-03-05 at 10 13 30 PM
https://github.com/vaiju1981/java-llama.cpp/actions/runs/13691068423

@kherud
Copy link
Owner

kherud commented Mar 8, 2025

I'm currently looking into it and it seems like in the new llama.cpp version there are additional shared libraries "ggml-base.dll" and "ggml-cpu.dll" which are missing in the Java binding and probably cause the UnsatisfiedLinkError. There are multiple solutions to this:

  • One option would be to try statically compiling all dependencies into "jllama.dll", so we just have to load the single library. I think this would cause the least headache to us, but I avoided this in the past, because I wanted users of the binding to be able to easily swap their "llama.dll" for an individually compiled version (e.g. with GPU support). We would lose this advantage, but users could still always compile the java-llama.cpp project to get a custom version.
  • We could adapt LlamaLoader.java to also load the missing libraries. I'm not a fan of this solution, though, since "ggml-cpu.dll" implies that the required libraries depend on the specific options used for compilation (e.g. is there also something like "ggml-cuda.dll"?). This would make the library loader complex and brittle.

I think we should go with option 1 for now and look how things turn out. I'll try to implement this and report back.

@vaiju1981
Copy link
Author

Hi @kherud I liked the option 1. it looks the cleanest one.

one thing we can do is have 2 builds for windows one without cuda and one with cuda. That way if user has gpu and want cuda support they can have that dependency. ( this would work for other architectures like unix)

@kherud
Copy link
Owner

kherud commented Mar 8, 2025

Yes, I agree. The binding doesn't offer windows cuda builds yet, and I always tried to avoid it since providing pre-built libraries really is a pain as you've seen, but maybe in the future.

I'm also disabling curl for now since I can't figure out to statically link it (it's tricky due to dependencies on system libraries). Users can always manually download models or compile the bindings themselves.

I think we should get a basic version for all major platforms working for now and finally merge the pull request.

@vaiju1981
Copy link
Author

Amazing work @kherud looks like windows build passed.

@kherud
Copy link
Owner

kherud commented Mar 8, 2025

@vaiju1981 Sadly not, it's only working in a cmake debug configuration (which has much worse performance). The library is now loaded without an UnsatisfiedLinkError, but there happens a segmentation fault when loading the library. It only happens in release mode (the compiler heavily optimizes the code then). So far I don't really have a clue about the problem. I tried running address and undefined sanitizers on Linux but they didn't report any problems.

@@ -291,8 +326,12 @@ JNIEXPORT jint JNICALL JNI_OnLoad(JavaVM *vm, void *reserved)
goto error;
}

printf("loaded JNI symbols\n"); fflush(stdout);

llama_backend_init();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After inserting some debug statements, the problem seems to appear when initializing the llama backend here.

@toystorynova
Copy link

toystorynova commented Mar 8, 2025

Is this the same error? #83 If so, Windows support is broken on previous versions too so a build with latest would be appreciated regardless

@kherud
Copy link
Owner

kherud commented Mar 9, 2025

@toystorynova Yes, good spot, this might likely be the same issue. The weird thing is that it correctly works when I build it on my Windows machine. I think I traced it down to this statement being called (via JNI_OnLoad -> llama_backend_init() -> ggml_init() -> ggml_critical_section_start()):

https://github.com/ggml-org/llama.cpp/blob/0fd7ca7a210bd4abc995cd728491043491dbdef7/ggml/src/ggml-threading.cpp#L7

It's likely a a race condition or multi-threading issue that leads to ggml_critical_section_mutex being un-initialized.

@vaiju1981
Copy link
Author

What is the impact of debug vs release for windows. Is it that debug will run nX slower then in release mode or needs more memory. Since the debug mode worked for windows, wondering if the impact of debug vs release are not high we can go with debug option for windows.

@kherud
Copy link
Owner

kherud commented Mar 9, 2025

Yeah, on the one hand it's unusably slow, I think, on the other hand we didn't really solve the underlying issue and it might surface again later. I'll look for more insight today. If I can't find anything, we can release a debug build for now. It's a better option than releasing no library at all I guess.

@kherud
Copy link
Owner

kherud commented Mar 9, 2025

It seems to work now 🎉 thank you again for the continued effort! It's ready to merge.

The next steps are:

  • I'll also merge Expose json schema to grammar conversion method #94 it seems like a reasonable change and shouldn't cause any problems
  • I think we should release a new major version (i.e. 4.0.0)
  • Hopefully the release workflow will work, it pre-compiles more shared libraries than just the CI workflow

@kherud kherud merged commit fd1b062 into kherud:master Mar 9, 2025
4 checks passed
@vaiju1981 vaiju1981 deleted the b4689 branch March 9, 2025 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants