✨[Feature] Resource aware Graph partitioner

# Problem 
Large models often fail or stall during TensorRT engine building because the compilation can use up to 5x of CPU memory, which exceeds host CPU memory limits. This could result in compilation freezing or being OOM-killed by the OS. Even with some optimizations, sometimes it's still hard to fit a big model into limited CPU RAM.

# Solution
We insert a pass after capability partitioning that performs further partitioning according to resources available.
It's a resource-aware graph partitioning pass that refines the capability-based split by further dividing oversized accelerated subgraphs so each resulting TRT engine fits a conservative CPU memory budget. The pass should:

- Reconstruct accelerated/non-accelerated subgraphs on the original torch.fx.GraphModule, preserving fusion groups and graph topological order.

- Estimate per-subgraph “size” by traversing reachable get_attr weights and summing tensor bytes, deduplicating shared parameters.

- Automatically determine the budget size of a subgraph according to available CPU memory or a user-defined CPU memory limit

- Iteratively split any accelerated subgraph that exceeds the budget by moving nodes from the front into a new subgraph, validating partition correctness, and never breaking fusion groups.

- Resulting in a roughly equal split across the whole graph in terms of parameter size

# Alternatives
- We tried splitting solely based on certain nodes (split after the SDPA node). This could not achieve a roughly equal split of the parameter sizes. However, we noticed some performance boost of the split graph compared to the original whole graph. This is counterintuitive and worth further investigation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨[Feature] Resource aware Graph partitioner #3906

Problem

Solution

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

✨[Feature] Resource aware Graph partitioner #3906

Description

Problem

Solution

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions