Support for creating asynchronous endpoints with my own model and inference code

**Describe the feature you'd like**
I would like to create an asynchronous inference endpoint with my own model, preprocessing and inference code with the SageMaker middle-level inference classes (`PyTorchModel`, `TensorFlowModel`, `MXNetModel`, etc.). Please provide documentation on how the custom preprocessing, inference, and postprocessing code (in the custom script specified by the entrypoint parameter of these classes) is called by SageMaker in the case of asynchronous invocation.

**How would this feature be used? Please describe.**
The new [asynchronous inference endpoint](https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-sagemaker-asynchronous-new-inference-option/) is very interesting for some of our use cases. For example, we need to run our custom model and inference code on relatively short video files. In the preprocessing script, we plan to extract frames from the video, send each frame to our custom model, and pack the inference results back to a frame timestamp - model response structure.

**Describe alternatives you've considered**
We are investigating using SageMaker processing jobs, AWS Batch, and ECS for this use case, but clearly, the SageMaker asynchronous inference would be the easiest, most adequate service to be used, with less code overhead from our side.

**Additional context**
Currently, the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html) and the [examples](https://github.com/aws/amazon-sagemaker-examples/tree/master/async-inference) show only how to create an async endpoint using the SageMaker provided inference images, without the possibility to implement the preprocess/inference/postprocess hooks.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for creating asynchronous endpoints with my own model and inference code #2619

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for creating asynchronous endpoints with my own model and inference code #2619

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions