Feature Request: Support multiple devices on a single rpc-server

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description


Make a signle rpc-server able to handle multiple devices (GPUs). That way it could pass data directly from one device to the next when applicable.

### Motivation

Goal: fully utilize the interconnect (PCIe) available on the rpc-server machine.

As of now (version b6084) you must launch one rpc-server instance per device (typically GPU) on your inference server.
If you happen to have 2 or more devices this results in a sub-optimal usage of the available interconnects (e.g. PCIe).
As far as I can see, even when devices on the rpc-server machine host contiguous model layers, all communication between them is still going through the network, to the client (llama-cli) and back.

I guess this was done for the sake of simplicity and ease of implementation.
There's therefore some performance gains to be made by making a single rpc-server able to handle all devices on a machine.

P.S. Thank you so much to everyone who put in the effort to build the rpc-server. It is such a great feature!

### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Support multiple devices on a single rpc-server #15210

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Support multiple devices on a single rpc-server #15210

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions