Skip to content

Conversation

@nathaliellenaa
Copy link
Contributor

@nathaliellenaa nathaliellenaa commented Sep 26, 2025

Description

Add a new agent execute stream API as an experimental feature to support agent streaming.

Supported agent - model for this agent execute stream API:

  • Conversational agent - OpenAI Chat Completion model
  • Conversational agent - Bedrock Converse Stream model

Note: This PR depends on the predict stream implementation #4187

API Endpoint:

POST /_plugins/_ml/agents/{agent_id}/_execute/stream

Sample workflow:

// Enable feature flag
PUT /_cluster/settings
{
    "persistent": {
        "plugins.ml_commons.stream_enabled": true
    }
}

// Register OpenAI chat completion model
POST /_plugins/_ml/models/_register
{
    "name": "openai gpt 3.5 turbo",
    "function_name": "remote",
    "description": "openai model",
    "connector": {
        "name": "OpenAI Chat Connector",
        "description": "The connector to public OpenAI model service for GPT 3.5",
        "version": 1,
        "protocol": "http",
        "parameters": {
            "endpoint": "api.openai.com",
            "model": "gpt-3.5-turbo"
        },
        "credential": {
            "openAI_key": "<your_api_key>"
        },
        "actions": [{
            "action_type": "predict",
            "method": "POST",
            "url": "https://${parameters.endpoint}/v1/chat/completions",
            "headers": {
                "Authorization": "Bearer ${credential.openAI_key}"
            },
            "request_body": "{ \"model\": \"${parameters.model}\", \"messages\": [{\"role\":\"developer\",\"content\":\"${parameters.system_prompt}\"},${parameters._chat_history:-}{\"role\":\"user\",\"content\":\"${parameters.prompt}\"}${parameters._interactions:-}]${parameters.tool_configs:-} }"
        }]
    }
}

// Register conversational agent using OpenAI model
POST /_plugins/_ml/agents/_register
{
    "name": "Chat Agent with RAG",
    "type": "conversational",
    "description": "this is a test agent",
    "llm": {
        "model_id": "<model id created in previous step>",
        "parameters": {
            "max_iteration": 5,
            "system_prompt": "You are a helpful assistant. You are able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics.\nIf the question is complex, you will split it into several smaller questions, and solve them one by one. For example, the original question is:\nhow many orders in last three month? Which month has highest?\nYou will spit into several smaller questions:\n1.Calculate total orders of last three month.\n2.Calculate monthly total order of last three month and calculate which months order is highest. You MUST use the available tools everytime to answer the question",
            "prompt": "${parameters.question}"
        }
    },
    "memory": {
        "type": "conversation_index"
    },
    "parameters": {
        "_llm_interface": "openai/v1/chat/completions"
    },
    "tools": [
        {
            "type": "IndexMappingTool",
            "name": "DemoIndexMappingTool",
            "parameters": {
                "index": "${parameters.index}",
                "input": "${parameters.question}"
            }
        },
        {
            "type": "ListIndexTool",
            "name": "RetrieveIndexMetaTool",
            "description": "Use this tool to get OpenSearch index information: (health, status, index, uuid, primary count, replica count, docs.count, docs.deleted, store.size, primary.store.size)."
        }
    ],
    "app_type": "chat_with_rag"
}

// Run agent execute stream API
POST /_plugins/_ml/agents/evCIh5kB66VN-aC_0aNf/_execute/stream
{
    "parameters": {
        "question": "How many indices are in my cluster?"
    }
}

// Sample response
data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":"[{\"index\":0.0,\"id\":\"call_HjpbrbdQFHK0omPYa6m2DCot\",\"type\":\"function\",\"function\":{\"name\":\"RetrieveIndexMetaTool\",\"arguments\":\"\"}}]","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":"[{\"index\":0.0,\"function\":{\"arguments\":\"{}\"}}]","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":"{\"choices\":[{\"message\":{\"tool_calls\":[{\"type\":\"function\",\"function\":{\"name\":\"RetrieveIndexMetaTool\",\"arguments\":\"{}\"},\"id\":\"call_HjpbrbdQFHK0omPYa6m2DCot\"}]},\"finish_reason\":\"tool_calls\"}]}","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":"","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":"row,health,status,index,uuid,pri(number of primary shards),rep(number of replica shards),docs.count(number of available documents),docs.deleted(number of deleted documents),store.size(store size of primary and replica shards),pri.store.size(store size of primary shards)\n1,green,open,.plugins-ml-model-group,Msb1Y4W5QeiLs5yUQi-VRg,1,1,2,0,17.1kb,5.9kb\n2,green,open,.plugins-ml-memory-message,1IWd1HPeSWmM29qE6rcj_A,1,1,658,0,636.4kb,313.5kb\n3,green,open,.plugins-ml-memory-meta,OETb21fqQJa3Y2hGQbknCQ,1,1,267,7,188kb,93.9kb\n4,green,open,.plugins-ml-config,0mnOWX5gSX2s-yP27zPFNw,1,1,1,0,8.1kb,4kb\n5,green,open,.plugins-ml-model,evYOOKN4QPqtmUjxsDwJYA,1,1,5,5,421.5kb,210.7kb\n6,green,open,.plugins-ml-agent,I0SpBovjT3C6NABCBzGiiQ,1,1,6,0,205.5kb,111.3kb\n7,green,open,.plugins-ml-task,_Urzn9gdSuCRqUaYAFaD_Q,1,1,100,4,136.1kb,45.3kb\n8,green,open,top_queries-2025.09.26-00444,jb7Q1FiLSl-wTxjdSUKs_w,1,1,1736,126,1.8mb,988kb\n9,green,open,.plugins-ml-connector,YaJORo4jT0Ksp24L5cW1uA,1,1,2,0,97.8kb,48.9kb\n","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":"There","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":" are","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":" ","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":"9","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":" indices","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":" in","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":" your","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":" cluster","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":".","is_last":false}}]}]}

data: {"inference_results":[{"output":[{"name":"memory_id","result":"LvU1iJkBCzHrriq5hXbN"},{"name":"parent_interaction_id","result":"L_U1iJkBCzHrriq5hXbs"},{"name":"response","dataAsMap":{"content":"","is_last":true}}]}]}

Error handling:
[In progress]

Related Issues

Resolves #3630

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@nathaliellenaa nathaliellenaa had a problem deploying to ml-commons-cicd-env-require-approval September 26, 2025 23:19 — with GitHub Actions Error
@nathaliellenaa nathaliellenaa had a problem deploying to ml-commons-cicd-env-require-approval September 26, 2025 23:19 — with GitHub Actions Failure
@nathaliellenaa nathaliellenaa had a problem deploying to ml-commons-cicd-env-require-approval September 26, 2025 23:19 — with GitHub Actions Error
@nathaliellenaa nathaliellenaa had a problem deploying to ml-commons-cicd-env-require-approval September 26, 2025 23:19 — with GitHub Actions Failure
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval September 30, 2025 00:21 — with GitHub Actions Waiting
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval September 30, 2025 00:21 — with GitHub Actions Waiting
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval September 30, 2025 00:21 — with GitHub Actions Waiting
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval September 30, 2025 00:21 — with GitHub Actions Waiting
@dhrubo-os
Copy link
Collaborator

resolve the conflict. Rebase from main?

@nathaliellenaa
Copy link
Contributor Author

Will push another commit to address comments from predict stream PR. Just cleaned up this PR to only include commit related to agent stream

@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval September 30, 2025 00:28 — with GitHub Actions Waiting
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval September 30, 2025 00:28 — with GitHub Actions Waiting
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval September 30, 2025 00:28 — with GitHub Actions Waiting
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval September 30, 2025 00:28 — with GitHub Actions Waiting
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval October 1, 2025 19:00 — with GitHub Actions Waiting
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval October 1, 2025 19:00 — with GitHub Actions Waiting
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval October 1, 2025 19:00 — with GitHub Actions Waiting
@nathaliellenaa nathaliellenaa requested a deployment to ml-commons-cicd-env-require-approval October 1, 2025 19:00 — with GitHub Actions Waiting
@pyek-bot
Copy link
Collaborator

pyek-bot commented Oct 1, 2025

@nathaliellenaa cool, it's a separate API anyways, in the future we should default to streaming for both predict and agents, in that case we can expose 2 flags

Signed-off-by: Nathalie Jonathan <[email protected]>
Signed-off-by: Nathalie Jonathan <[email protected]>
Signed-off-by: Nathalie Jonathan <[email protected]>
Signed-off-by: Nathalie Jonathan <[email protected]>
Signed-off-by: Nathalie Jonathan <[email protected]>
Signed-off-by: Nathalie Jonathan <[email protected]>
Signed-off-by: Nathalie Jonathan <[email protected]>
Signed-off-by: Nathalie Jonathan <[email protected]>
Signed-off-by: Nathalie Jonathan <[email protected]>
@ylwu-amzn ylwu-amzn had a problem deploying to ml-commons-cicd-env-require-approval October 1, 2025 19:32 — with GitHub Actions Failure
@ylwu-amzn ylwu-amzn had a problem deploying to ml-commons-cicd-env-require-approval October 1, 2025 19:32 — with GitHub Actions Error
@ylwu-amzn ylwu-amzn had a problem deploying to ml-commons-cicd-env-require-approval October 1, 2025 19:32 — with GitHub Actions Failure
@ylwu-amzn ylwu-amzn had a problem deploying to ml-commons-cicd-env-require-approval October 1, 2025 19:32 — with GitHub Actions Error
@nathaliellenaa
Copy link
Contributor Author

Failing CI seems unrelated, run both IT locally and they're passing

RestCohereInferenceIT > test_cohereInference_withDifferent_postProcessFunction FAILED
    java.lang.AssertionError: failed to run test with test name: connector.post_process.cohere_v2.embedding.float_test
        at __randomizedtesting.SeedInfo.seed([EA9F7EF19FCC2013:1911E683B8CE553B]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at org.opensearch.ml.rest.RestCohereInferenceIT.test_cohereInference_withDifferent_postProcessFunction(RestCohereInferenceIT.java:77)


RestMLRAGSearchProcessorIT > testBM25WithBedrockConverse FAILED
    org.opensearch.client.ResponseException: method [POST], host [http://[::1]:33109], URI [/test/_search?size=5&search_pipeline=pipeline_test], status line [HTTP/1.1 429 Too Many Requests]
    {"error":{"root_cause":[{"type":"remote_connector_throttling_exception","reason":"Error from remote service: The request was denied due to remote server throttling. To change the retry policy and behavior, please update the connector client_config."}],"type":"remote_connector_throttling_exception","reason":"Error from remote service: The request was denied due to remote server throttling. To change the retry policy and behavior, please update the connector client_config."},"status":429}

@ylwu-amzn ylwu-amzn had a problem deploying to ml-commons-cicd-env-require-approval October 1, 2025 20:19 — with GitHub Actions Failure
@ylwu-amzn ylwu-amzn had a problem deploying to ml-commons-cicd-env-require-approval October 1, 2025 20:19 — with GitHub Actions Failure
@ylwu-amzn ylwu-amzn had a problem deploying to ml-commons-cicd-env-require-approval October 1, 2025 20:19 — with GitHub Actions Error
@ylwu-amzn ylwu-amzn had a problem deploying to ml-commons-cicd-env-require-approval October 1, 2025 20:19 — with GitHub Actions Error
@dhrubo-os
Copy link
Collaborator

approving to merge the PR by 2:00 PM.

@ylwu-amzn ylwu-amzn merged commit b9b5687 into opensearch-project:main Oct 1, 2025
9 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Remote Model Inference Streaming

6 participants