Skip to content

[Enhancement] SageMaker async inference #8

@athewsey

Description

@athewsey

To host the model, this sample currently deploys a real-time SageMaker endpoint backed by GPU - which may be fine for high-volume use cases but probably pretty resource-intensive for many.

Since the end-to-end workflow here is asynchronous anyway (may have a human review component), it's probably a good case for new SageMaker asynchronous inference feature which supports scaling down to zero when demand is low.

  • Wait until async inference is supported in SageMaker Python SDK, rather than confusing the notebook with boto3 endpoint setup (tracking their issue and pull request)
  • Update notebook 2 to create an async endpoint with scale-to-zero capability
  • Update pipeline to correctly consume async endpoints

TBD: Do we need to retain a real-time deployment option for anybody that wants to optimize for low latency? Seems unnecessary to me at the moment

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions