update to llama.cpp b5688 #115

arnej27959 · 2025-06-20T19:37:26Z

extract updated glue code to server.hpp and utils.cpp
adapt native code in jllama.cpp to track API changes
update tags and adapt CMakeLists.txt

note: All the code in utils.hpp and server.hpp is just lifted directly from llama.cpp, so I don't really have a good understanding of all the changes that happened there. But I've looked closely at the changes that needed to be done in jllama.cpp to track the API changes, and done a bit of testing of the resulting binary.

longer-term it would be better to get the server code in llama.cpp split so it these pieces can be used as a library directly; we are considering writing a different server wrapper with protobuf/rpc replacing JSON/http, which could use the same library.

- extract updated glue code to server.hpp and utils.cpp - adapt native code in jllama.cpp to track API changes - update tags and adapt CMakeLists.txt

kherud · 2025-06-20T20:18:17Z

Awesome work 👍 though it'll take some time for me to review.

longer-term it would be better to get the server code in llama.cpp split so it these pieces can be used as a library directly;

Yeah, if you can get the llama.cpp team to do this, it would make things much easier!

we are considering writing a different server wrapper with protobuf/rpc replacing JSON/http, which could use the same library.

I'm a big fan of protobuf/rpc, so I'll look forward to that!

comment says: // if the assistant message appears at the end of list, we do not add end-of-turn token so it seems the changed output should be is expected.

arnej27959 · 2025-07-11T13:13:29Z

any progress on looking at these changes?

arnej27959 added 3 commits June 20, 2025 19:07

update to llama.cpp b5688

a4efc99

- extract updated glue code to server.hpp and utils.cpp - adapt native code in jllama.cpp to track API changes - update tags and adapt CMakeLists.txt

fix my mistake regarding '--reranking' option

decfd30

let LLAMA_CURL default to OFF for now

a3dfdc6

track code in oaicompat_chat_params_parse

8c609e3

comment says: // if the assistant message appears at the end of list, we do not add end-of-turn token so it seems the changed output should be is expected.

DevinTDHa mentioned this pull request Jul 17, 2025

Fix passing of embedding pooling type #118

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

update to llama.cpp b5688 #115

update to llama.cpp b5688 #115

Uh oh!

arnej27959 commented Jun 20, 2025

Uh oh!

kherud commented Jun 20, 2025

Uh oh!

arnej27959 commented Jul 11, 2025

Uh oh!

Uh oh!

update to llama.cpp b5688 #115

Are you sure you want to change the base?

update to llama.cpp b5688 #115

Uh oh!

Conversation

arnej27959 commented Jun 20, 2025

Uh oh!

kherud commented Jun 20, 2025

Uh oh!

arnej27959 commented Jul 11, 2025

Uh oh!

Uh oh!