[Roadmap]: H2 timelines and key focus areas

Hi Dynamo developers! We wanted to present the following **key timelines and focus areas** for Dynamo in H2 2025.

### 📅 Timeline
We plan on having 3 major releases (0.5 - 0.7) until we reach GA at the end of this year. Dynamo will continue to be released on approximately biweekly cadence, and the target dates are shown below: 

| v0.4.1 | v0.5.0 | v0.5.1 | v0.6.0 | v0.6.1 | v0.7.0 |
| :-------: | :------: | :-------: | :-------: | :------: | :-------: |
| 8/27     | 9/17     | 10/8    | 10/22   | 11/5  | 11/19   |

We are likely to release another minor release (0.7.1) in December. 

To improve our ability to track progress, we'll create a detailed roadmap for each major and minor release, such as 0.4.1 and 0.5.0. Each roadmap will include linked issues and pull requests, offering a clearer view of our development plans. This is in addition to the general H2 plan outlined here.

### 🎯 H2 focus areas
We will focus on making Dynamo to work seamlessly across 5 key areas:

1. Performance
2. Fault tolerance
3. K8 deployment
4. KV cache management and transfer
5. Scheduling with smart router and planner 

These are our key focus areas, but we are working on other features as well (e.g. Multi-LoRA, tool calling, etc). As mentioned earlier, a more detailed roadmap for each release will be available in a separate issue.

#### Here is our progress so far up to v0.4.0 release for the 5 areas: 
* **Performance** 
    * Support for both aggregated and disaggregated serving with all inference engines (SGLang, TRT-LLM, and vLLM)
    * [AIConfigurator](https://github.com/ai-dynamo/aiconfigurator) to estimate performance between aggregated and disaggregated serving given model, config and HW and provide best configs. 
    * [Functional] R1 disaggregated wideEP support with SGLang and TRT-LLM
        * TRT-LLM wideEP E2E tested with EP16  
        * SGLang wideEP decode perf tested and verified. Collaborating to improve prefill perf.
    * [Functional] Multimodal E/P/D disaggregation with TRT-LLM and vLLM. 
* **Fault tolerance**
    * Request migration
    * Instance fault detection and tracking
    * K8 health check, base metrics for request handler, and structured logging
* **K8 deployment**
    * New component model: Moving from the Dynamo SDK to direct interaction with the Dynamo Runtime. Utilize python commands & args for component deployments
    * CRD-Based Deployment: Shifting from dynamo CLI to direct single-CRD approach.
    * Pre-built NGC images ([SGLang, TRT-LLM, vLLM](https://catalog.ngc.nvidia.com/containers?filters=&orderBy=scoreDESC&query=dynamo&page=&pageSize=))
    * Deployment guides ([EKS, AKS](https://github.com/ai-dynamo/dynamo/tree/main/examples))
* **KV cache management and transfer**
    * NIXL plugins and benchmarks
        * Networking plugins: UCX and Mooncake
        * Storage plugins: GDS, POSIX, S3, 3FS
        * GPU initiated transfer plugins: DOCA GPUNetIO and UCX initiated comms
        * NIXLBench and NIXLKVBench
    * KV Block Manager using vLLM with GPU, host memory and local disk (PR expected on 8/18. To be included in v0.4.1 on 8/27)
* **Scheduling with smart routing and planning** 
    * Customizable smart router leveraging KV cache hit rate and KV cache load
    * SLA (TTFT & ITL) based planner with vLLM to dynamically allocate prefill and decode workers

#### Below is list of our major features leading up to GA

* **Performance** 
    * Benchmarks for top 4 models across all inference engines
        * Performance validation and 30 day stress test before GA
    * Disaggregated serving with pre-computed KV cache
    * Attention - FFN Disaggregated serving (AFD) 
* **Fault tolerance**
    * Request cancellation & rejection
    * GPU and engine health monitoring 
    * High availability and recovery
* **K8 deployment**
    * Additional high quality docs and examples (GKE, ECS, etc)
    * Grove (+KAI scheduler) integration
    * Inference gateway integration
    * SGLang OME integration
* **KV cache management and transfer**
    * NIXL 
        * Elastic EP and wideEP fault tolerance
        * AWS Libfabric support
        * NIXL plugin support for top 10 storage vendors
        * Elastic EP and fault tolerance
        * Network optimizations for large scale inference (congestion, QoS, etc) 
    * KVBM (KV Block Manager)
        * LMCache integration
        * Modular KV Block Manager
        * SGLang and TRT-LLM support
        * Custom eviction policies
* **Scheduling with smart routing and planning**
    * Composible frontend (API server, Processor, Router)
    * Standalone router and planner
    * Hierchical router
    * SLA based planner for SGLang and TRT-LLM
    * [AIConfigurator](https://github.com/ai-dynamo/aiconfigurator) integration to planner. 

If there are any additional features that needs to be considered or prioritized, please let us know in the comment. Thank you so much for your ongoing feedback, and we will do our best to incorporate them to GA Dynamo in December 🙏.


### Describe the problem you're encountering

N/A

### Describe alternatives you've tried

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap]: H2 timelines and key focus areas #2486

📅 Timeline

🎯 H2 focus areas

Here is our progress so far up to v0.4.0 release for the 5 areas:

Below is list of our major features leading up to GA

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap]: H2 timelines and key focus areas #2486

Description

📅 Timeline

🎯 H2 focus areas

Here is our progress so far up to v0.4.0 release for the 5 areas:

Below is list of our major features leading up to GA

Describe the problem you're encountering

Describe alternatives you've tried

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions