docs: Add multimodal documentation vllm, sglang, and trtllm backends #4510

indrajit96 · 2025-11-20T19:27:48Z

Overview:

Add comprehensive multimodal guides for vLLM, Sglang and TRT-LLM backends documenting architectures, deployment modes, input formats, and known limitations.

Details:

New: docs/backends/vllm/multimodal_vllm_guide.md - Complete vLLM multimodal reference
-New: docs/backends/trtllm/multimodal_trtllm_guide.md - Complete TRT-LLM multimodal reference
-New: dynamo/docs/backends/sglang/multimodal_sglang_guide.md - Complete SGlang multimodal reference

Signed-off-by: Indrajit Bhosale <[email protected]>

rmccorm4

Can you update https://github.com/ai-dynamo/dynamo/blob/main/docs/multimodal/multimodal_intro.md at the bottom with links to each of the backend specific docs as a central location?

Signed-off-by: Indrajit Bhosale <[email protected]>

rmccorm4

re: https://github.com/ai-dynamo/dynamo/actions/runs/19580969130/job/56078496698?pr=4510

#15 5.473 checking consistency... /workspace/dynamo/docs/backends/sglang/multimodal_sglang_guide.md: WARNING: document isn't included in any toctree [toc.not_included]
#15 5.474 /workspace/dynamo/docs/backends/trtllm/multimodal_trtllm_guide.md: WARNING: document isn't included in any toctree [toc.not_included]
#15 5.474 /workspace/dynamo/docs/backends/vllm/multimodal_vllm_guide.md: WARNING: document isn't included in any toctree [toc.not_included]

Needs these files added to docs/hidden_toctree.rst

rmccorm4 · 2025-11-21T21:07:10Z

@krishung5 can you help review the docs here? Main point is to clearly document what is supported in each backend today with relation to multimodality, and highlight at least 1 key example or model for each.

docs/backends/trtllm/multimodal_trtllm_guide.md

krishung5 · 2025-11-21T22:10:49Z

docs/backends/sglang/multimodal_sglang_guide.md

+
+DISAGGREGATED (E->P->D):
+  Client → Frontend → Processor → Encoder [NIXL] → Prefill [bootstrap] → Decode → Response
+  • 4 components • Vision encoder + KV sharing • Bootstrap coordination


What does the KV sharing here mean? Does it just mean PD disagg?

krishung5

Thanks for putting up this doc, great work! I think one minor comment is that, we have multimodal doc for all three frameworks, so maybe we can link them in the guide here somehow? i.e. vllm, trtllm here and here, and sglang

krishung5 · 2025-11-21T22:38:30Z

docs/backends/trtllm/multimodal_trtllm_guide.md

+```
+SIMPLE AGGREGATED (agg.sh):
+  Client → Frontend (Rust) → Worker [image load, encode, P+D] → Response
+  • 2 components • --modality multimodal • Easiest setup


I think it could be a bit confusing on --modality multimodal as users might not be familiar with the launch scripts. I understand we want to keep it short here in the bullet points, but maybe we can do something like this?

SIMPLE AGGREGATED (agg.sh): Client → Frontend (Rust) → Worker [image load, encode, P+D] → Response • 2 components • worker flag `--modality multimodal` • Easiest setup

krishung5 · 2025-11-21T22:40:26Z

docs/backends/trtllm/multimodal_trtllm_guide.md

+
+### Launch Script
+
+Example: `examples/backends/trtllm/launch/agg.sh`


Maybe add actual link here?

krishung5 · 2025-11-21T22:43:34Z

docs/backends/trtllm/multimodal_trtllm_guide.md

+| **Frontend → Prefill** | Request with image URL or embedding path | No |
+| **Encode → Prefill (Precomputed Embeddings)** | NIXL metadata (pre-computed embeddings) | Yes (Embeddings tensor) |
+| **Encode → Prefill (Image URL) (WIP)** | Disaggregated params with multimodal handles | No (Handles via params) |
+| **Prefill → Decode** | Disaggregated params | Yes/No (KV cache - UCX or NIXL) |


Qq -

Yes/No (KV cache - UCX or NIXL)

Does this mean
Yes(KV cache transfer using NIXL)
No(KV cache transfer using UCX)

krishung5 · 2025-11-21T22:51:16Z

docs/backends/vllm/multimodal_vllm_guide.md

+
+```
+SIMPLE AGGREGATED (agg_multimodal.sh):
+  Client → Frontend (Rust) → Worker [image load, encode, P+D] → Response


Do we want to highlight the rust processor?

Suggested change

Client → Frontend (Rust) → Worker [image load, encode, P+D] → Response

Client → Frontend (Rust processor) → Worker [image load, encode, P+D] → Response

krishung5 · 2025-11-21T22:53:29Z

docs/backends/vllm/multimodal_vllm_guide.md

+| **Data URL** | `data:image/jpeg;base64,/9j/4AAQ...` | Base64-encoded inline data | ✅ |
+
+
+## Aggregated Mode (PD)


Could we clarify here it's the EPD not the simple aggregated one? Do we want to add a section for the simple aggregated workflow? I think one confusion that I got frequently from people is EPD vs simple/traditional. If we could align on the wording that would be very helpful.

krishung5 · 2025-11-21T22:58:22Z

docs/backends/vllm/multimodal_vllm_guide.md

+
+### Launch Script
+
+Example: `examples/backends/vllm/launch/disagg_multimodal_llama.sh`


same here, can add the actual link to the script.

2/3 Done

94ddb67

Signed-off-by: Indrajit Bhosale <[email protected]>

pull-request-size bot added the size/XL label Nov 20, 2025

rmccorm4 reviewed Nov 20, 2025

View reviewed changes

3/3 Done

cd8ed73

Signed-off-by: Indrajit Bhosale <[email protected]>

pull-request-size bot added size/XXL and removed size/XL labels Nov 21, 2025

rmccorm4 changed the title ~~2/3 Done~~ docs: Add multimodal documentation vllm, sglang, and trtllm backends Nov 21, 2025

github-actions bot added the docs label Nov 21, 2025

rmccorm4 reviewed Nov 21, 2025

View reviewed changes

rmccorm4 requested review from GuanLuo and krishung5 November 21, 2025 21:06

rmccorm4 reviewed Nov 21, 2025

View reviewed changes

docs/backends/trtllm/multimodal_trtllm_guide.md Show resolved Hide resolved

krishung5 reviewed Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Add multimodal documentation vllm, sglang, and trtllm backends #4510

docs: Add multimodal documentation vllm, sglang, and trtllm backends #4510

Uh oh!

indrajit96 commented Nov 20, 2025 •

edited

Loading

Uh oh!

rmccorm4 left a comment

Uh oh!

rmccorm4 left a comment

Uh oh!

rmccorm4 commented Nov 21, 2025

Uh oh!

Uh oh!

krishung5 Nov 21, 2025

Uh oh!

krishung5 left a comment

Uh oh!

krishung5 Nov 21, 2025

Uh oh!

krishung5 Nov 21, 2025

Uh oh!

krishung5 Nov 21, 2025

Uh oh!

krishung5 Nov 21, 2025

Uh oh!

krishung5 Nov 21, 2025

Uh oh!

krishung5 Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		### Launch Script

		Example: `examples/backends/trtllm/launch/agg.sh`

	Client → Frontend (Rust) → Worker [image load, encode, P+D] → Response
	Client → Frontend (Rust processor) → Worker [image load, encode, P+D] → Response

		\| Data URL \| `data:image/jpeg;base64,/9j/4AAQ...` \| Base64-encoded inline data \| ✅ \|


		## Aggregated Mode (PD)


		### Launch Script

		Example: `examples/backends/vllm/launch/disagg_multimodal_llama.sh`

docs: Add multimodal documentation vllm, sglang, and trtllm backends #4510

Are you sure you want to change the base?

docs: Add multimodal documentation vllm, sglang, and trtllm backends #4510

Uh oh!

Conversation

indrajit96 commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 commented Nov 21, 2025

Uh oh!

Uh oh!

krishung5 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

krishung5 left a comment

Choose a reason for hiding this comment

Uh oh!

krishung5 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

krishung5 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

krishung5 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

krishung5 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

krishung5 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

krishung5 Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

indrajit96 commented Nov 20, 2025 •

edited

Loading