generated from amazon-archives/__template_MIT-0
-
Notifications
You must be signed in to change notification settings - Fork 23
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
To host the model, this sample currently deploys a real-time SageMaker endpoint backed by GPU - which may be fine for high-volume use cases but probably pretty resource-intensive for many.
Since the end-to-end workflow here is asynchronous anyway (may have a human review component), it's probably a good case for new SageMaker asynchronous inference feature which supports scaling down to zero when demand is low.
- Wait until async inference is supported in SageMaker Python SDK, rather than confusing the notebook with boto3 endpoint setup (tracking their issue and pull request)
- Update notebook 2 to create an async endpoint with scale-to-zero capability
- Update pipeline to correctly consume async endpoints
TBD: Do we need to retain a real-time deployment option for anybody that wants to optimize for low latency? Seems unnecessary to me at the moment
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request