Skip to content

Feature request: allow load/unload models on server #70

@jakexcosme

Description

@jakexcosme

Note: This issue was copied from ggml-org#16487

Original Author: @ngxson
Original Issue Number: ggml-org#16487
Created: 2025-10-09T14:30:41Z


I extract this discussion from ggml-org#13367 , mainly for better planning tasks around this.

allow loading / unloading model via API: in server.cpp, we can add a kinda "super" main() function that wraps around the current main(). The new main will spawn an "interim" HTTP server that expose the API to load a model. Ofc this functionality will be restricted to local deployment to avoid any security issues.

This idea has been demo in ggml-org#13400 , but the implementation is still far from usable. It actually requires a refactoring of server.

While alternative methods for hot-swapping model already exist, I think refactoring the server.cpp code can still benefit the long-term development quite a lot. Therefore, this feature can potentially be a suitable goal for the refactoring efforts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions