-
Notifications
You must be signed in to change notification settings - Fork 182
Support for vLLM Data parallel #1663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Pods are comprised of one or more containers. Will each vLLM engine instance run as a separate container in the vLLM pod? |
if p.ModelServerMetricsPort == 0 { | ||
p.ModelServerMetricsPort = targetPortNumber | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that GetMetricsPort()
does not implement this default behavior, which makes sense now that targetPortNumber is a list. We need to note this as a breaking change in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no breaking change. See the code in pkg/epp/datastore/datastore.go lines 242-244. If there is only one targetPort in the InferencePool and the ModelServerMetricsPort from the command line is not zero it will be used to fill the metricsPort in the PodInfo struct. The function GetMetricsPort() simply returns what was placed in the struct earlier.
@shmuelk please create a tracker issue for adding a conformance test that includes multiple InferencePool targetPorts. |
They are run as separate processes in the same container as far as I know. What launches Data parallel in a real vLLM is a parameter to vLLM. I don't think that can add containers to the pod. |
fec83a2
to
885d81c
Compare
pkg/epp/datastore/datastore.go
Outdated
} | ||
} | ||
|
||
func (ds *datastore) PodRemove(podName string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add this to the PodDelete function? It should be the same behavior it was for one engine per pod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/epp/backend/metrics/metrics.go
Outdated
} | ||
return fmt.Sprintf("%s://%s:%d%s", p.ModelServerMetricsScheme, pod.Address, p.ModelServerMetricsPort, p.ModelServerMetricsPath) | ||
func (p *PodMetricsClientImpl) getMetricEndpoint(pod *backend.Pod) string { | ||
return fmt.Sprintf("%s://%s:%d%s", p.ModelServerMetricsScheme, pod.GetIPAddress(), pod.GetMetricsPort(), p.ModelServerMetricsPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noticed this, but since this is called in FetchMetrics, this is a highly trafficked function, can we just cache this endpoint string instead of calling Sprintf? I don't really see a case in which this is mutated within the pod/endpoint lifecycle.
Understood that it was this way before this PR, but would be a nice cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PodInfo now has the metrics pre-built. This code is now a simple concatenation of strings.
return fmt.Errorf("expected 1 target port, got %d", len(pool.Spec.TargetPorts)) | ||
} | ||
updated, err := pm.pmc.FetchMetrics(ctx, pm.GetPod(), pm.GetMetrics(), int32(pool.Spec.TargetPorts[0].Number)) | ||
updated, err := pm.pmc.FetchMetrics(ctx, pm.GetPod(), pm.GetMetrics()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is much cleaner, I like.
// PodInfo represents the relevant Kubernetes Pod state of an inference server. | ||
type PodInfo struct { | ||
NamespacedName types.NamespacedName | ||
PodName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an 'Endpoint` struct also defined in this package, would it make more sense to define that as the object that the rest of the system uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Endpoint is actually an interface. It is more related to the data gathering done in the Datalayer.
I think a refactoring all over the place from Pod to Endpoint, is not something to be done in this PR. I also think this needs to be done in phases due to the size of the effort. Some potential steps:
- Rename the current Endpoint interface to something like EndpointMetrics
- Replace all over the place backendmetrics.PodMetrics with datalayer.EndpointMetrics. In reality backendmetrics.PodMetrics is simply a reference to datalayer.EndPoint, which is somewhat confusing.
- Appropriately rename the pod related structs, functions, and variables to Endpoint related names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM. Can we open an issue to track this work?
In reality backendmetrics.PodMetrics is simply a reference to datalayer.EndPoint, which is somewhat confusing.
Agreed, we should get rid of excess layers where we can.
pm.UpdatePod(pod) | ||
return ok | ||
|
||
pods := []*datalayer.PodInfo{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, i think the use of the nomenclature pods
when referring to a specific rank/engine is making this PR more difficult to read, and subsequently, I think that will impact the codebase. Suggest using Endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above
pkg/epp/datastore/datastore.go
Outdated
Labels: labels, | ||
}) | ||
} | ||
if len(pods) == 1 && ds.modelServerMetricsPort != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can check the length of ds.pool.Spec.TargetPorts
before iteration and assembling the endpoint list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/epp/datalayer/podinfo.go
Outdated
NamespacedName types.NamespacedName | ||
PodName string | ||
Address string | ||
Port int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we want to make this a string (would prevent Itoa calls), do we ever want to use this like a number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
885d81c
to
6c061ec
Compare
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
/lgtm Looks good to me, just holding to give others a chance to review if they want. If we don't hear anything after an hr or two we can unhold and keep moving |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kfswain, shmuelk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds support for the vLLM Data Parallel feature. The vLLM Data Parallel feature causes the vLLM "launcher" to launch many vLLM instances in the same Pod, each listening on a different port.
The InferencePool CRD has already been changed to support this by allowing up to eight TargetPorts to be specified. It is assumed that all pods in the InferencePool have been configured the same way WRT Data Parallelism.
In an attempt to minimize the amount of changes to the code, the datastore has been modified to create "virtual pods" from the real pods that are found by the pod reconciler. These virtual pods are all given names that are the real pod's name concatenated with the string "-rank-N", where N is a number from zero to seven. The term rank was used as that's what each of the separate vLLM "servers" in a Data Parallel configuration are called.
The former code has a notion of a globally known ports for inference and metrics scraping. This has been eliminated and instead inference port and metrics port fields have been added to the PodInfo struct. In addition a field was added, PodName, that contains the name of the real pod used to create the "virtual pods".
Lastly the API of the PreRequest extension point has been changed, removing the inference port parameter. Any PreRequest extensions must get the inference port of the pod(s) in question from the PodInfo's GetPort() API.
Which issue(s) this PR fixes:
Fixes #1519
Does this PR introduce a user-facing change?: