[Enhancement] SageMaker async inference

To host the model, this sample currently deploys a real-time SageMaker endpoint backed by GPU - which may be fine for high-volume use cases but probably pretty resource-intensive for many.

Since the end-to-end workflow here is asynchronous anyway (may have a human review component), it's probably a good case for new [SageMaker asynchronous inference](https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html) feature which [supports scaling down to zero](https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-autoscale.html) when demand is low.

- [x] Wait until async inference is supported in SageMaker Python SDK, rather than confusing the notebook with boto3 endpoint setup (tracking their [issue](https://github.com/aws/sagemaker-python-sdk/issues/2619) and [pull request](https://github.com/aws/sagemaker-python-sdk/pull/2846))
- [ ] Update notebook 2 to create an async endpoint with scale-to-zero capability
- [ ] Update pipeline to correctly consume async endpoints

TBD: Do we need to retain a real-time deployment option for anybody that wants to optimize for low latency? Seems unnecessary to me at the moment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement] SageMaker async inference #8

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Enhancement] SageMaker async inference #8

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions