integrated vlm code for benchmark for Eagle2 #3698

chohk88 · 2025-07-21T16:27:17Z

Description

Closing the previous pull request (#3652) due to rebase difficulties with the main branch. This new PR resubmits the same changes for the VLM benchmark framework—now cleanly rebased on the latest main branch—and incorporates all feedback from the original review.

Integrated VLM benchmark framework
- Currently supports Eagle2, Qwen 2.5-VL
- Planned support: Paligemma etc.
Added custom token-generation function** for multi-modal (MM) models

Type of change

Please delete options that are not relevant and/or add your own.

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

peri044 · 2025-08-06T20:15:00Z

Qwen model : command I used:
python run_vlm.py

Error:

File "/work/TensorRT/tools/llm/run_vlm.py", line 448, in <module>
    inputs = load_inputs(args, processor, DEVICE)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/TensorRT/tools/llm/run_vlm.py", line 188, in load_inputs
    from qwen_vl_utils import process_vision_info
ModuleNotFoundError: No module named 'qwen_vl_utils'

peri044 · 2025-08-06T20:17:17Z

When I tried Eagle2 model, it shows

Traceback (most recent call last):
  File "/work/TensorRT/tools/llm/run_vlm.py", line 443, in <module>
    model, processor, emb_layer = load_model(args.model, DEVICE, dtype)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/TensorRT/tools/llm/run_vlm.py", line 141, in load_model
    return _load_eagle2(device, torch_dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/TensorRT/tools/llm/run_vlm.py", line 101, in _load_eagle2
    AutoModel.from_pretrained(
  File "/root/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4336, in from_pretrained
    config = cls._autoset_attn_implementation(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2109, in _autoset_attn_implementation
    cls._check_and_enable_flash_attn_2(
  File "/root/.pyenv/versions/3.11.13/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2252, in _check_and_enable_flash_attn_2
    raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
root@45fb01c53ae9:/work/TensorRT/tools/llm# python

peri044

Please update docs and add these models to the list of supported models.

peri044 · 2025-08-06T20:20:15Z

tools/llm/run_vlm.py

+# This patch is global for the script's execution context.
+import transformers.models.qwen2.modeling_qwen2 as mq
+
+mq.ALL_ATTENTION_FUNCTIONS["flash_attention_2"] = mq.ALL_ATTENTION_FUNCTIONS["sdpa"]


Did you try this instead ? Do you think the following will work ?

model.config._attn_implementation = "sdpa"

peri044 · 2025-08-06T20:24:17Z

tools/llm/run_vlm.py

+    url = "https://www.ilankelman.org/stopsigns/australia.jpg"
+    image = Image.open(requests.get(url, stream=True).raw)


This can be default but can you also add an argument image_path where a user can provide a path to image on their local system ?

peri044 · 2025-08-06T20:28:43Z

tools/llm/run_vlm.py

+    ]
+
+    # --- Model-specific vision processing ---
+    if "qwen" in args.model.lower():


minor comment: consider matching the model name exactly here since there can be multiple variants with similar naming (eg: qwen2, qwen3 etc)

peri044 · 2025-08-06T20:33:06Z

tools/llm/run_vlm.py

+    max_seq_len = input_embeds.shape[1] + args.num_tokens
+
+    seq_len = torch.export.Dim("seq", min=1, max=max_seq_len)
+    position_ids = torch.arange(input_embeds.shape[1]).unsqueeze(0).to(DEVICE)


let's make the device an argument as well.

peri044 · 2025-08-06T20:38:52Z

tools/llm/run_vlm.py

+            disable_tf32=True,
+            use_python_runtime=True,
+            debug=args.debug,
+            offload_module_to_cpu=True,


could you please make the other arguments (disable_tf32, use_python_runtime, offload_module_to_cpu) configurable as well ?

peri044 · 2025-08-06T21:16:09Z

tools/llm/utils.py

+    image_embeds = None
+    if pixel_values is not None:
+        image_embeds = model.visual(pixel_values, image_grid_thw)
+
+    # 2. Create initial sequence embeddings
+    seq_tokens = input_ids.clone()
+    seq_embeds = emb_layer(seq_tokens)
+
+    # 3. Insert image embeddings at image token positions
+    if image_embeds is not None:
+        mask = seq_tokens == model.config.image_token_id
+        num_image_tokens = mask.sum().item()
+        if num_image_tokens != image_embeds.shape[0]:
+            raise ValueError(
+                f"Number of image tokens ({num_image_tokens}) does not match number of image embeddings ({image_embeds.shape[0]})."
+            )
+        mask_expanded = mask.unsqueeze(-1).expand_as(seq_embeds)
+        seq_embeds = seq_embeds.masked_scatter(
+            mask_expanded, image_embeds.to(seq_embeds.dtype)
+        )
+


Please add a similar section on Qwen2 on what parts of the graph are optimized and what is not.

peri044 · 2025-08-06T21:16:55Z

tools/llm/utils.py

+        hidden_states, kv_cache = outputs_and_kv[0], outputs_and_kv[1:]
+
+        # Use logit_pos to get the correct logit based on whether we padded or not.
+        logits = model.lm_head(hidden_states[:, -1, :])


Do we not optimize lm_head ?

peri044 · 2025-08-06T21:18:07Z

tools/llm/utils.py

+def generate_mm_paligemma(
+    model,
+    pixel_values: torch.Tensor | None,
+    input_ids: torch.Tensor,
+    max_output_seq_length: int,
+    eos_token_id: int,
+    emb_layer: torch.nn.Embedding,


Can you add docstring to this function ? Also mention in the docstring that paligemma is currently under development if you want to keep this function here.

peri044 · 2025-08-06T21:18:31Z

tools/llm/utils.py

+    emb_layer: torch.nn.Embedding,
+    device: str = "cuda:0",
+) -> torch.LongTensor:
+    vit_embeds = None


similar comment as above:
Can you add docstring to this function ? Also mention in the docstring that paligemma is currently under development if you want to keep this function here.

peri044 · 2025-08-06T21:19:01Z

tools/llm/utils.py

+
+
+@torch.inference_mode()
+def generate_mm_qwen2_5_vl_with_timing(


can we reuse the code from generate_mm_qwen2_5_vl here ?

chohk88 added 3 commits July 17, 2025 13:10

integrated vlm code for benchmark

e4e09bb

add vision_model compile

9980c4c

Improve clarity of naming and comments

e5e63e5

chohk88 requested review from peri044 and zewenli98 July 21, 2025 16:27

chohk88 self-assigned this Jul 21, 2025

chohk88 added component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jul 21, 2025

meta-cla bot added the cla signed label Jul 21, 2025

github-actions bot removed component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Jul 21, 2025

chohk88 added 2 commits July 24, 2025 13:12

support qwen2.5_vl with cache

5d98dc4

fix: align ISL/OSL with arguments and remove padding in language model

cfe1b23

peri044 requested changes Aug 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

integrated vlm code for benchmark for Eagle2 #3698

integrated vlm code for benchmark for Eagle2 #3698

Uh oh!

chohk88 commented Jul 21, 2025 •

edited

Loading

Uh oh!

peri044 commented Aug 6, 2025

Uh oh!

peri044 commented Aug 6, 2025

Uh oh!

peri044 left a comment

Uh oh!

peri044 Aug 6, 2025

Uh oh!

peri044 Aug 6, 2025

Uh oh!

peri044 Aug 6, 2025

Uh oh!

peri044 Aug 6, 2025

Uh oh!

peri044 Aug 6, 2025

Uh oh!

peri044 Aug 6, 2025

Uh oh!

peri044 Aug 6, 2025

Uh oh!

peri044 Aug 6, 2025

Uh oh!

peri044 Aug 6, 2025

Uh oh!

peri044 Aug 6, 2025

Uh oh!

Uh oh!

		url = "https://www.ilankelman.org/stopsigns/australia.jpg"
		image = Image.open(requests.get(url, stream=True).raw)



		@torch.inference_mode()
		def generate_mm_qwen2_5_vl_with_timing(

integrated vlm code for benchmark for Eagle2 #3698

Are you sure you want to change the base?

integrated vlm code for benchmark for Eagle2 #3698

Uh oh!

Conversation

chohk88 commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Checklist:

Uh oh!

peri044 commented Aug 6, 2025

Uh oh!

peri044 commented Aug 6, 2025

Uh oh!

peri044 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chohk88 commented Jul 21, 2025 •

edited

Loading