Seems like server dropped multimodal support in [5882](https://github.com/ggerganov/llama.cpp/pull/5882). It would be great if it can come back with proper implementation. 🙏