-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Note: This issue was copied from ggml-org#16487
Original Author: @ngxson
Original Issue Number: ggml-org#16487
Created: 2025-10-09T14:30:41Z
I extract this discussion from ggml-org#13367 , mainly for better planning tasks around this.
allow loading / unloading model via API: in
server.cpp, we can add a kinda "super"main()function that wraps around the currentmain(). The new main will spawn an "interim" HTTP server that expose the API to load a model. Ofc this functionality will be restricted to local deployment to avoid any security issues.
This idea has been demo in ggml-org#13400 , but the implementation is still far from usable. It actually requires a refactoring of server.
While alternative methods for hot-swapping model already exist, I think refactoring the server.cpp code can still benefit the long-term development quite a lot. Therefore, this feature can potentially be a suitable goal for the refactoring efforts.