Skip to content

Support for creating asynchronous endpoints with my own model and inference code #2619

@mrtj

Description

@mrtj

Describe the feature you'd like
I would like to create an asynchronous inference endpoint with my own model, preprocessing and inference code with the SageMaker middle-level inference classes (PyTorchModel, TensorFlowModel, MXNetModel, etc.). Please provide documentation on how the custom preprocessing, inference, and postprocessing code (in the custom script specified by the entrypoint parameter of these classes) is called by SageMaker in the case of asynchronous invocation.

How would this feature be used? Please describe.
The new asynchronous inference endpoint is very interesting for some of our use cases. For example, we need to run our custom model and inference code on relatively short video files. In the preprocessing script, we plan to extract frames from the video, send each frame to our custom model, and pack the inference results back to a frame timestamp - model response structure.

Describe alternatives you've considered
We are investigating using SageMaker processing jobs, AWS Batch, and ECS for this use case, but clearly, the SageMaker asynchronous inference would be the easiest, most adequate service to be used, with less code overhead from our side.

Additional context
Currently, the documentation and the examples show only how to create an async endpoint using the SageMaker provided inference images, without the possibility to implement the preprocess/inference/postprocess hooks.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions