@@ -684,6 +684,96 @@ For more detailed explanations of the classes that this library provides for aut
684684- `API docs for HyperparameterTuner and parameter range classes <https://sagemaker.readthedocs.io/en/stable/tuner.html >`__
685685- `API docs for analytics classes <https://sagemaker.readthedocs.io/en/stable/analytics.html >`__
686686
687+ **********************************
688+ SageMaker Asynchronous Inference
689+ **********************************
690+ Amazon SageMaker Asynchronous Inference is a new capability in SageMaker that queues incoming requests and processes them asynchronously.
691+ This option is ideal for requests with large payload sizes up to 1GB, long processing times, and near real-time latency requirements.
692+ Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process,
693+ so you only pay when your endpoint is processing requests. More information about
694+ SageMaker Serverless Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html >`__.
695+
696+ To deploy asynchronous endpoint, you will need to create a ``AsyncInferenceConfig `` object.
697+ If you create ``AsyncInferenceConfig `` without specifying its arguments, the default ``S3OutputPath `` will
698+ be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-output/{UNIQUE-JOB-NAME} ``. (example shown below):
699+
700+ .. code :: python
701+
702+ from sagemaker.async_inference import AsyncInferenceConfig
703+
704+ # Create an empty AsyncInferenceConfig object to use default values
705+ async_config = new AsyncInferenceConfig()
706+
707+ Or you can specify configurations in ``AsyncInferenceConfig `` as you like (example shown below):
708+
709+ .. code :: python
710+
711+ # Specify S3OutputPath, MaxConcurrentInvocationsPerInstance and NotificationConfig in the async config object
712+ async_config = new AsyncInferenceConfig(
713+ output_path = " s3://{s3_bucket} /{bucket_prefix} /output" ,
714+ max_concurrent_invocations_per_instance = 10 ,
715+ notification_config = {
716+ " SuccessTopic" : " arn:aws:sns:aws-region:account-id:topic-name" ,
717+ " ErrorTopic" : " arn:aws:sns:aws-region:account-id:topic-name" ,
718+ }
719+ )
720+
721+ Then use the ``AsyncInferenceConfig `` in the estimator's ``deploy() `` method to deploy an asynchronous endpoint:
722+
723+ .. code :: python
724+
725+ # Deploys the model that was generated by fit() to a SageMaker asynchronous endpoint
726+ async_predictor = estimator.deploy(async_inference_config = async_config)
727+
728+ After deployment is complete, it will return an ``AsyncPredictor ``. You can use it to perform asynchronous inference
729+ by using ``predict_async() `` and then get the result in the future. For input data, you can upload data to S3 bucket
730+ and use that:
731+
732+ .. code :: python
733+
734+ # Upload data to S3 bucket then use that as input
735+ async_response = async_predictor.predict_async(input_path = input_s3_path)
736+
737+ Or you can serialize data and use it directly just like real-time inference. This option will let Amazon SageMaker SDK
738+ upload the data to Amazon S3 bucket under ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-input/ ``.
739+
740+ .. code :: python
741+
742+ # Serializes data and makes a prediction request to the SageMaker asynchronous endpoint
743+ async_response = async_predictor.predict_async(data = data)
744+
745+ Then you can switch to other stuff and wait the inference to complete. After it completed, you can check
746+ the result then:
747+
748+ .. code :: python
749+
750+ # Switch back to check the result
751+ result = async_response.get_result()
752+
753+ If you want to wait the result at the first place, you can use ``predict() `` method. It will check the result
754+ periodically and return the result when it appears in the output Amazon S3 path:
755+
756+ .. code :: python
757+
758+ # Use predict() to wait for the result
759+ response = async_predictor.predict(data = data)
760+
761+ # Or use Amazon S3 input path
762+ response = async_predictor.predict(input_path = input_s3_path)
763+
764+ Clean up the endpoint and model if needed after inference:
765+
766+ .. code :: python
767+
768+ # Tears down the SageMaker endpoint and endpoint configuration
769+ async_predictor.delete_endpoint()
770+
771+ # Deletes the SageMaker model
772+ async_predictor.delete_model()
773+
774+ For more details about Asynchronous Inference,
775+ see the API docs for `Asynchronous Inference <https://sagemaker.readthedocs.io/en/stable/api/inference/async_inference.html >`__
776+
687777*******************************
688778SageMaker Serverless Inference
689779*******************************
0 commit comments