feat(server): Add tool call support to WebUI (LLama Server) #13501

samolego · 2025-05-13T10:20:09Z

Make sure to read the contributing guidelines before submitting a PR

Hi there!
I added tool calling support for the frontend interface to llama-server. ~~There are still some things to fix, but I'm opening this early to get some feedback.~~

Currently there's only 1 tool added (basic javascript interpreter with sandboxed iframe code eval), but the code structure supports expanding in the future.

tools/server/webui/src/Config.ts

tools/server/webui/src/utils/tool_calling/js_repl_tool.ts

samolego · 2025-05-18T12:42:28Z

I added toggles to settings now:

Also merged the assistant responses on rendering, it looks better imo:

samolego · 2025-05-18T13:41:49Z

After merging latest master:

tools/server/webui/src/components/ChatMessage.tsx

Loufe · 2025-05-18T22:10:02Z

You may want to avoid enlarging the scope any further in this PR, it's already quite large. Further changes should be in a further PR.

tools/server/webui/src/utils/tool_calling/js_repl_tool.ts

samolego · 2025-05-19T06:32:59Z

You may want to avoid enlarging the scope any further in this PR, it's already quite large. Further changes should be in a further PR.

Sure! I'm done with the changes I wanted. I can perhaps move js repl tool into a separate PR, if needed.

samolego · 2025-05-26T17:52:05Z

I have to do some more testing now since supporting streaming tool calls.

samolego · 2025-05-29T18:55:12Z

I did the final fixes, also updated the demo conversation to include a chained message with tool call.

This should now be ready to review.

ngxson · 2025-06-02T22:31:27Z

I'll work on the PR this week. Will need to move your commits to a new PR otherwise I can't push to your master branch

samolego · 2025-06-03T05:41:47Z

Thank you! Can I help you somehow? I can also rename the branch and open a new PR if that's the problem.

samolego · 2025-06-16T13:12:53Z

@ngxson do you need any help I can provide to move this further?

itroot · 2025-10-11T16:39:57Z

@samolego @ngxson folks, do you need any help with this?

samolego · 2025-10-11T19:45:16Z

@ngxson will need to tell :). If nothibg else, probably polishing out the conflicts 😝

ngxson · 2025-10-12T20:50:31Z

I didn't merge this PR because the stability of the frontend was a much more important concern. Now that we moved to the new Sveltekit-based UI, tool-calling was something I already brought up in one of my recent discussion with @allozaur . We will take our time to plan this, as there is already a long backlog of other features need to be implemented.

allozaur · 2025-10-13T07:32:47Z

I didn't merge this PR because the stability of the frontend was a much more important concern. Now that we moved to the new Sveltekit-based UI, tool-calling was something I already brought up in one of my recent discussion with @allozaur . We will take our time to plan this, as there is already a long backlog of other features need to be implemented.

cc @ServeurpersoCom

ServeurpersoCom · 2025-10-13T08:08:41Z

That's an absolutely amazing piece of work ; really impressive engineering.

But I have to point out one concern: doing tool-calling execution entirely on the client side can lead us straight back into the same kind of fragmentation we saw during the early 'thinking/CoT' phase. Each model ends up needing its own quirks handled in the frontend, which means model-specific bugs, technical debt, and constant maintenance inside the Svelte WebUI. That's not something we usually want to encourage in a project like llama.cpp, where stability and generic design are key.

It's a brilliant idea, but not a generic one ; the browser sandbox, CORS restrictions, and the fact that the user must keep the tab open all make it fragile. A better long-term approach would be to build a clean parser or handler that lives independently of the UI, so the frontend just displays results, while the tool-execution logic stays modular and testable.

For example, I've implemented a Node.js proxy that sits transparently in the SSE stream. It detects tool calls and forwards them to an HTTP hook, so any model can use tools without changing the frontend. Behind that, you can sandbox whatever you want ; even control a headless browser to let the LLM 'browse' the web safely.

That said, my Node.js proxy is yet another intermediate layer and custom language component, and ideally this logic should live in the backend itself, as a configurable and generic tool-calling interface ; something to be designed together with the core developers, according to their vision for the project. That would encourage a cleaner modularization of the parser, where there's still a lot of important work to be done, and help push the community to contribute everything in one place ; improving both llama.cpp and its parser.

That kind of separation keeps llama.cpp safe and generic, while still allowing all the experimentation you want on the tooling side.

And honestly, every geek or nerd who runs llama.cpp locally will prefer to have server-side hooks anyway : to control their smart home, experiment with 'Jarvis'-like automations, or integrate private APIs. In my case, I run my setup at work through a shared Svelte UI at serveurperso.com/ia (and a STT/TTS telegram bot!)

TL;DR :

Client-side tool-calling is clever but risky : it leads to fragmentation, browser limits, and model-specific maintenance. A backend-based, generic, and configurable tool-calling interface should be developed collaboratively by the community and core devs, centralizing contributions to improve both llama.cpp and its parser. This keeps the project clean, secure, and extensible while allowing server-side innovation for local power users.

samolego · 2025-10-13T12:36:42Z

Thanks for explaining!
I'll close this, since it is way out of date now, too.

fergusq · 2025-10-15T13:22:52Z

@ServeurpersoCom I'm not sure implementing tool calling in the back end is a right call. If we consider a typical use case where the same llama.cpp server is used by many people, 1) these people might have their own tools they want to call not available on the backend, and 2) from a security viewpoint, things like code execution are difficult to implement safely on the back end side. Neither of these issues surface if the tool calling is on the browser side. It seems to me that back end tool calling would be mostly relevant for people self-hosting a model for themselves and not in production environments with multiple users.

Should someone require back end tool calling anyway, a proxy such as the one you have implemented is a better choice, as in production systems the inference engine is very often hosted on a different node than the tools (e.g. the inference engine is on a GPU node and the tools are on a CPU-only node). If the server implements the OpenAI API correctly including its tool calling format, the proxy can be made model-independently.

For those who use llama.cpp locally for one person only, back end tool calling would work better, but there are already many OSS projects catering for those people, so, in my opinion, they are not a priority.

ServeurpersoCom · 2025-10-15T14:23:28Z

@ServeurpersoCom I'm not sure implementing tool calling in the back end is a right call. If we consider a typical use case where the same llama.cpp server is used by many people, 1) these people might have their own tools they want to call not available on the backend, and 2) from a security viewpoint, things like code execution are difficult to implement safely on the back end side. Neither of these issues surface if the tool calling is on the browser side. It seems to me that back end tool calling would be mostly relevant for people self-hosting a model for themselves and not in production environments with multiple users.

Should someone require back end tool calling anyway, a proxy such as the one you have implemented is a better choice, as in production systems the inference engine is very often hosted on a different node than the tools (e.g. the inference engine is on a GPU node and the tools are on a CPU-only node). If the server implements the OpenAI API correctly including its tool calling format, the proxy can be made model-independently.

For those who use llama.cpp locally for one person only, back end tool calling would work better, but there are already many OSS projects catering for those people, so, in my opinion, they are not a priority.

Obviously, that’s already how it works today : tool calls are exposed through the API, and each client is free to handle them however they want. The real issue is that depending on the model, the tool-call payload format can change.
When I mentioned doing it “in the backend,” I wasn’t talking about embedding tool execution inside the inference server itself. I meant introducing a normalization layer (OpenAI-Compatible, Harmony),so that we can maximize the number of clients that can consume tool calls reliably.

There’s an actual inventory to do here, and before anything else, the server needs a proper refactor (see the open issue about splitting core/http : server.cpp has become too large). That kind of work is a higher priority, than implementing tool support directly in the Svelte WebUI would surface tons of inconsistent bugs depending on the model output format.

In practice, this could easily be achieved by providing a small standalone binary, similar to my experimental proxy.
It would simply listen to the SSE stream and expose HTTP hooks or system command mappings (which are essentially the same concept), allowing arbitrary execution environments (pods, remote sandboxes, Kubernetes containers, etc.) to be attached to different models.

This approach scales from local experimentation to industrial-grade setups: you could attach isolated execution environments to each model instance, just like ChatGPT internally routes tool calls to separate subsystems.

For enthusiasts or self-hosters, the same binary could also act as a personal automation bridge, letting anyone build their own “Jarvis”-style setup and trigger smart-home commands directly through the LLM. It’s far more powerful (and fun) than being confined to a single browser tab for the same amount of work.

Real Firefox-ESR browser (non-headless) running in a remote, disposable, read-only environment; no instruction prompt, only toolcall documentation (tool name and parameters) :
https://github.com/user-attachments/assets/c2ae9381-dc33-438c-9092-fb3f364d9d4c

ServeurpersoCom · 2025-10-15T15:24:20Z

Thanks to this PR, it actually gave me this idea:

What we can safely do, though, is add an option in the WebUI a simple checkbox in Settings that, like the reasoning_content working (thinking blocks), enables the display of OpenAI-Compatible toolcall chunks (inside a similar type of block).

This would remain consistent with the OpenAI-Compat API logic, be very useful for developers building larger systems on top of llama.cpp as both a built-in debugger and a reference example, and follow the core principle that “the client should be able to see every chunk any model is capable of emitting.”

It would have no runtime impact, introduce no security or parsing risks over time, and be highly educational.

samolego added 7 commits May 7, 2025 19:05

feat(server): add basic js tool call support

acd4767

code abstraction for tool calling

6236918

minor changes, renames

e84e819

add tool call fields

f6b1386

fix: Use structured tool_calls for tool handling

f2175cb

fix: forward tool call info back to api

4698b66

provide tool call response in a dropdown

69e7119

github-actions bot added examples server labels May 13, 2025

Loufe reviewed May 13, 2025

View reviewed changes

tools/server/webui/src/Config.ts Outdated Show resolved Hide resolved

ngxson reviewed May 13, 2025

View reviewed changes

tools/server/webui/src/utils/tool_calling/js_repl_tool.ts Outdated Show resolved Hide resolved

samolego added 6 commits May 13, 2025 18:11

Fix UI updates after tool call chains

75fd25e

move js evaluation to sandboxed iframe, remove debug logs

ae32a9a

merge assistant messages on tool use

00d911d

feat: populate settings tool calling section

d99808f

feat: add stream response setting

0b34d53

fix: revert base url

0480054

Merge remote-tracking branch 'upstream/master' into feat/tool-calling

7fa0043

samolego marked this pull request as ready for review May 18, 2025 13:41

Loufe reviewed May 18, 2025

View reviewed changes

tools/server/webui/src/components/ChatMessage.tsx Outdated Show resolved Hide resolved

Loufe reviewed May 18, 2025

View reviewed changes

tools/server/webui/src/utils/tool_calling/js_repl_tool.ts Outdated Show resolved Hide resolved

samolego added 4 commits May 19, 2025 09:06

fix: readd missing comments

b128ca5

fix: more cleanup

c203815

minor changes

cf110f9

Delete deno.lock

4e7da1b

samolego requested a review from ngxson May 20, 2025 10:58

ngxson self-assigned this May 26, 2025

feat: support streaming tool calls

3f76cac

samolego marked this pull request as draft May 26, 2025 17:49

samolego added 5 commits May 27, 2025 16:29

bugfixes for streaming calls

c98baef

fix demo conversation import

22a951b

fix: make chained message regeneratable

798946e

updates to config and available tools map

92f8bb0

better handling of logged variables in js repl

5c898ec

samolego marked this pull request as ready for review May 29, 2025 18:54

samolego requested a review from ngxson May 31, 2025 06:17

LostRuins mentioned this pull request Jul 30, 2025

feat: Agent mode LostRuins/lite.koboldai.net#130

Draft

samolego closed this Oct 13, 2025

ngxson mentioned this pull request Oct 14, 2025

extend server/public_simplechat with simple minded interactive browser-client side based toolcalling - base logic #16563

Open

ServeurpersoCom mentioned this pull request Oct 15, 2025

Feature Request: Add a debug option to display OpenAI-Compatible toolcall chunks in the WebUI #16597

Open

4 tasks

ServeurpersoCom mentioned this pull request Oct 22, 2025

webui: add OAI-Compat Harmony tool-call streaming visualization and persistence in chat UI #16618

Open

jakexcosme mentioned this pull request Oct 22, 2025

Feature Request: Add a debug option to display OpenAI-Compatible toolcall chunks in the WebUI COG-GTM/llama.cpp#58

Open

4 tasks

feat(server): Add tool call support to WebUI (LLama Server) #13501

feat(server): Add tool call support to WebUI (LLama Server) #13501

Uh oh!

Conversation

samolego commented May 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

samolego commented May 18, 2025

Uh oh!

samolego commented May 18, 2025

Uh oh!

Uh oh!

Loufe commented May 18, 2025

Uh oh!

Uh oh!

samolego commented May 19, 2025

Uh oh!

samolego commented May 26, 2025

Uh oh!

samolego commented May 29, 2025

Uh oh!

ngxson commented Jun 2, 2025

Uh oh!

samolego commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samolego commented Jun 16, 2025

Uh oh!

itroot commented Oct 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samolego commented Oct 11, 2025

Uh oh!

ngxson commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allozaur commented Oct 13, 2025

Uh oh!

ServeurpersoCom commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR :

Uh oh!

samolego commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fergusq commented Oct 15, 2025

Uh oh!

ServeurpersoCom commented Oct 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

samolego commented May 13, 2025 •

edited

Loading

samolego commented Jun 3, 2025 •

edited

Loading

itroot commented Oct 11, 2025 •

edited

Loading

ngxson commented Oct 12, 2025 •

edited

Loading

ServeurpersoCom commented Oct 13, 2025 •

edited

Loading

samolego commented Oct 13, 2025 •

edited

Loading

ServeurpersoCom commented Oct 15, 2025 •

edited

Loading