@@ -684,6 +684,99 @@ For more detailed explanations of the classes that this library provides for aut
684684- `API docs for HyperparameterTuner and parameter range classes <https://sagemaker.readthedocs.io/en/stable/tuner.html >`__
685685- `API docs for analytics classes <https://sagemaker.readthedocs.io/en/stable/analytics.html >`__
686686
687+ **********************************
688+ SageMaker Asynchronous Inference
689+ **********************************
690+ Amazon SageMaker Asynchronous Inference is a new capability in SageMaker that queues incoming requests and processes them asynchronously.
691+ This option is ideal for requests with large payload sizes up to 1GB, long processing times, and near real-time latency requirements.
692+ Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process,
693+ so you only pay when your endpoint is processing requests. More information about
694+ SageMaker Asynchronous Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html >`__.
695+
696+ To deploy asynchronous inference endpoint, you will need to create a ``AsyncInferenceConfig `` object.
697+ If you create ``AsyncInferenceConfig `` without specifying its arguments, the default ``S3OutputPath `` will
698+ be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-outputs/{UNIQUE-JOB-NAME} ``. (example shown below):
699+
700+ .. code :: python
701+
702+ from sagemaker.async_inference import AsyncInferenceConfig
703+
704+ # Create an empty AsyncInferenceConfig object to use default values
705+ async_config = new AsyncInferenceConfig()
706+
707+ Or you can specify configurations in ``AsyncInferenceConfig `` as you like. All of those configuration parameters
708+ are optionally but if you don’t specify the ``output_path ``, Amazon SageMaker will use the default ``S3OutputPath ``
709+ mentioned above (example shown below):
710+
711+ .. code :: python
712+
713+ # Specify S3OutputPath, MaxConcurrentInvocationsPerInstance and NotificationConfig in the async config object
714+ async_config = new AsyncInferenceConfig(
715+ output_path = " s3://{s3_bucket} /{bucket_prefix} /output" ,
716+ max_concurrent_invocations_per_instance = 10 ,
717+ notification_config = {
718+ " SuccessTopic" : " arn:aws:sns:aws-region:account-id:topic-name" ,
719+ " ErrorTopic" : " arn:aws:sns:aws-region:account-id:topic-name" ,
720+ }
721+ )
722+
723+ Then use the ``AsyncInferenceConfig `` in the estimator's ``deploy() `` method to deploy an asynchronous inference endpoint:
724+
725+ .. code :: python
726+
727+ # Deploys the model that was generated by fit() to a SageMaker asynchronous inference endpoint
728+ async_predictor = estimator.deploy(async_inference_config = async_config)
729+
730+ After deployment is complete, it will return an ``AsyncPredictor `` object. To perform asynchronous inference, you first
731+ need to upload data to S3 and then use the ``predict_async() `` method with the s3 URI as the input. It will return an
732+ ``AsyncInferenceResponse `` object:
733+
734+ .. code :: python
735+
736+ # Upload data to S3 bucket then use that as input
737+ async_response = async_predictor.predict_async(input_path = input_s3_path)
738+
739+ The Amazon SageMaker SDK also enables you to serialize the data and pass the payload data directly to the
740+ ``predict_async() `` method. For this pattern of invocation, the Amazon SageMaker SDK will upload the data to an Amazon
741+ S3 bucket under ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-inputs/ ``.
742+
743+ .. code :: python
744+
745+ # Serializes data and makes a prediction request to the SageMaker asynchronous endpoint
746+ async_response = async_predictor.predict_async(data = data)
747+
748+ Then you can switch to other stuff and wait the inference to complete. After it is completed, you can check
749+ the result using ``AsyncInferenceResponse ``:
750+
751+ .. code :: python
752+
753+ # Switch back to check the result
754+ result = async_response.get_result()
755+
756+ Alternatively, if you would like to check periodically for result and return it when it has been generated in the
757+ output Amazon S3 path, use the ``predict() `` method
758+
759+ .. code :: python
760+
761+ # Use predict() to wait for the result
762+ response = async_predictor.predict(data = data)
763+
764+ # Or use Amazon S3 input path
765+ response = async_predictor.predict(input_path = input_s3_path)
766+
767+ Clean up the endpoint and model if needed after inference:
768+
769+ .. code :: python
770+
771+ # Tears down the SageMaker endpoint and endpoint configuration
772+ async_predictor.delete_endpoint()
773+
774+ # Deletes the SageMaker model
775+ async_predictor.delete_model()
776+
777+ For more details about Asynchronous Inference,
778+ see the API docs for `Asynchronous Inference <https://sagemaker.readthedocs.io/en/stable/api/inference/async_inference.html >`__
779+
687780*******************************
688781SageMaker Serverless Inference
689782*******************************
0 commit comments