aws
diff --git a/‎CHANGELOG.md‎
Lines changed: 44 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 44 additions & 0 deletions
diff --git a/‎VERSION‎
Lines changed: 1 addition & 1 deletion b/‎VERSION‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/api/training/sdp_versions/latest.rst‎
Lines changed: 1 addition & 1 deletion b/‎doc/api/training/sdp_versions/latest.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst‎
Lines changed: 32 additions & 7 deletions b/‎doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst‎
Lines changed: 32 additions & 7 deletions
diff --git a/‎doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst‎
Lines changed: 78 additions & 6 deletions b/‎doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst‎
Lines changed: 78 additions & 6 deletions
diff --git a/‎doc/api/training/smp_versions/latest.rst‎
Lines changed: 2 additions & 2 deletions b/‎doc/api/training/smp_versions/latest.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.rst‎
Lines changed: 14 additions & 0 deletions b/‎doc/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.rst‎
Lines changed: 14 additions & 0 deletions
diff --git a/‎doc/experiments/sagemaker.experiments.rst‎
Lines changed: 9 additions & 0 deletions b/‎doc/experiments/sagemaker.experiments.rst‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎doc/frameworks/djl/using_djl.rst‎
Lines changed: 6 additions & 6 deletions b/‎doc/frameworks/djl/using_djl.rst‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎doc/frameworks/pytorch/using_pytorch.rst‎
Lines changed: 1 addition & 1 deletion b/‎doc/frameworks/pytorch/using_pytorch.rst‎
Lines changed: 1 addition & 1 deletion
@@ -1,5 +1,49 @@
 # Changelog
 
+## v2.151.0 (2023-04-27)
+
+### Features
+
+ * Update Transformers 4.26 - TensorFlow 2.11.0 Image URI
+ * Add Extra Parameters to Lambda Function Wrapper
+
+### Bug Fixes and Other Changes
+
+ * Add kms key support for Model registration
+ * Enable inference recommender slow tests
+ * Pass sagemaker session to downstream s3 calls
+ * Add ap-south-1 to no p3 regions
+ * skip test for p2 instance for TF2.12 and above
+
+### Documentation Changes
+
+ * Fix minor misses from the remote function doc release
+
+## v2.150.0 (2023-04-26)
+
+### Features
+
+ * Introduce TensorBoard app class
+
+### Bug Fixes and Other Changes
+
+ * Update data wrangler images
+
+## v2.149.0 (2023-04-25)
+
+### Features
+
+ * Support TF2.12 SageMaker DLC
+
+### Bug Fixes and Other Changes
+
+ * update the doc for Join function
+ * change s3UploadMode of sagemaker clarify processing output for computer vision jobs.
+
+### Documentation Changes
+
+ * Add Remote Function updates
+
 ## v2.148.0 (2023-04-20)
 
 ### Features
 
@@ -1 +1 @@
-2.148.1.dev0
+2.151.1.dev0
@@ -26,7 +26,7 @@ depending on the version of the library you use.
    <https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
    for more information.
 
-For versions between 1.4.0 and 1.7.0 (Latest)
+For versions between 1.4.0 and 1.8.0 (Latest)
 =============================================
 
 .. toctree::
 
@@ -5,39 +5,64 @@ Release Notes
 #############
 
 New features, bug fixes, and improvements are regularly made to the SageMaker
-distributed data parallel library.
+data parallelism library.
 
-SageMaker Distributed Data Parallel 1.7.0 Release Notes
+SageMaker Distributed Data Parallel 1.8.0 Release Notes
 =======================================================
 
-*Date: Feb. 10. 2023*
+*Date: Apr. 17. 2023*
 
 **Currency Updates**
 
-* Added support for PyTorch 1.13.1.
+* Added support for PyTorch 2.0.0.
 
 **Migration to AWS Deep Learning Containers**
 
 This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
 
-- PyTorch 1.13.1 DLC
+- PyTorch 2.0.0 DLC
 
   .. code::
 
-    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker
+    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker
 
 Binary file of this version of the library for custom container users:
 
   .. code::
 
-    https://smdataparallel.s3.amazonaws.com/binary/pytorch/1.13.1/cu117/2023-01-09/smdistributed_dataparallel-1.7.0-cp39-cp39-linux_x86_64.whl
+    https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.0.0/cu118/2023-03-20/smdistributed_dataparallel-1.8.0-cp310-cp310-linux_x86_64.whl
 
 
 ----
 
 Release History
 ===============
 
+SageMaker Distributed Data Parallel 1.7.0 Release Notes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*Date: Feb. 10. 2023*
+
+**Currency Updates**
+
+* Added support for PyTorch 1.13.1.
+
+**Migration to AWS Deep Learning Containers**
+
+This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
+
+- PyTorch 1.13.1 DLC
+
+  .. code::
+
+    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker
+
+Binary file of this version of the library for custom container users:
+
+  .. code::
+
+    https://smdataparallel.s3.amazonaws.com/binary/pytorch/1.13.1/cu117/2023-01-09/smdistributed_dataparallel-1.7.0-cp39-cp39-linux_x86_64.whl
+
 SageMaker Distributed Data Parallel 1.6.0 Release Notes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 
@@ -3,12 +3,88 @@ Release Notes
 #############
 
 New features, bug fixes, and improvements are regularly made to the SageMaker
-distributed model parallel library.
+model parallelism library.
 
 
-SageMaker Distributed Model Parallel 1.14.0 Release Notes
+SageMaker Distributed Model Parallel 1.15.0 Release Notes
 =========================================================
 
+*Date: Apr. 27. 2023*
+
+**Currency Updates**
+
+* Added support for PyTorch v2.0.0.
+  Note that the library does not support ``torch.compile`` in this release.
+
+**New Features**
+
+* Using sharded data parallelism with tensor parallelism together is now
+  available for PyTorch 1.13.1. It allows you to train with smaller global batch
+  sizes while scaling up to large clusters. For more information, see `Sharded
+  data parallelism with tensor parallelism <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html#model-parallel-extended-features-pytorch-sharded-data-parallelism-with-tensor-parallelism>`_
+  in the *Amazon SageMaker Developer Guide*.
+* Added support for saving and loading full model checkpoints when using sharded
+  data parallelism. This is enabled by using the standard checkpointing API,
+  ``smp.save_checkpoint`` with ``partial=False``.
+  Before, full checkpoints needed to be created by merging partial checkpoint
+  files after training finishes.
+* `DistributedTransformer <https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.html#smdistributed.modelparallel.torch.nn.DistributedTransformerLayer>`_
+  now supports the ALiBi position embeddings.
+  When using DistributedTransformer, you can set the ``use_alibi`` parameter
+  to ``True`` to use the Triton-based flash attention kernels. This helps
+  evaluate sequences longer than those used for training.
+
+**Bug Fixes**
+
+* When using tensor parallelism, parameters were initialized multiple times
+  unncessarily. This release fixed the multiple initialization of parameters
+  so that each parameter is initialized exactly once.
+  It not only saves time, but also ensures that the random generator behavior
+  is similar to the non-tensor parallelism case.
+
+**Known issues**
+
+* Model initialization might take longer with PyTorch 2.0 than that with PyTorch 1.13.
+
+**Migration to AWS Deep Learning Containers**
+
+This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
+
+- SageMaker training container for PyTorch v2.0.0
+
+  .. code::
+
+    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker
+
+- SageMaker training container for PyTorch v1.13.1
+
+  .. code::
+
+    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker
+
+Binary file of this version of the library for `custom container
+<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html#model-parallel-bring-your-own-container>`_ users:
+
+- For PyTorch v2.0.0
+
+  .. code::
+
+    https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-2.0.0/build-artifacts/2023-04-14-20-14/smdistributed_modelparallel-1.15.0-cp310-cp310-linux_x86_64.whl
+
+- For PyTorch v1.13.1
+
+  .. code::
+
+    https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.13.1/build-artifacts/2023-04-17-15-49/smdistributed_modelparallel-1.15.0-cp39-cp39-linux_x86_64.whl
+
+----
+
+Release History
+===============
+
+SageMaker Distributed Model Parallel 1.14.0 Release Notes
+---------------------------------------------------------
+
 *Date: Jan. 30. 2023*
 
 **Currency Updates**
@@ -39,10 +115,6 @@ Binary file of this version of the library for `custom container
 
     https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.13.1/build-artifacts/2023-01-19-18-35/smdistributed_modelparallel-1.14.0-cp39-cp39-linux_x86_64.whl
 
-----
-
-Release History
-===============
 
 SageMaker Distributed Model Parallel 1.13.0 Release Notes
 ---------------------------------------------------------
 
@@ -10,8 +10,8 @@ depending on which version of the library you need to use.
 To use the library, reference the
 **Common API** documentation alongside the framework specific API documentation.
 
-Version 1.11.0, 1.13.0, 1.14.0 (Latest)
-=======================================
+Version 1.11.0, 1.13.0, 1.14.0, 1.15.0 (Latest)
+===============================================
 
 To use the library, reference the Common API documentation alongside the framework specific API documentation.
 
 
@@ -302,6 +302,20 @@ Tensor Parallelism Module APIs
       -  ``post_layernorm``: If ``True``, inserts layer normalization at
          the output. At least one of ``pre_layernorm`` and
          ``post_layernorm`` must be ``True``.
+      -  ``use_alibi`` (bool, default False): Activates Attention with
+         Linear Biases (ALiBi) for attention computation.
+         ALiBi facilitates efficient extrapolation on input sequences
+         and thus improves training efficiency.
+         The library enables ALiBi by using the `Triton
+         flash attention kernel
+         <https://github.com/HazyResearch/flash-attention>`_.
+         Refer to https://arxiv.org/abs/2108.12409 for more
+         details on the technique.
+         (Available from
+         the SageMaker model parallelism library v1.15.0.)
+      -  ``alibi_bias_max`` (int, default 8): Defines the ALiBi base
+         value for mask generation. (Available from
+         the SageMaker model parallelism library v1.15.0.)
 
    -  **Methods:**
 
 
@@ -11,6 +11,15 @@ Run
 
 .. automethod:: sagemaker.experiments.list_runs
 
+Experiment
+-------------
+
+.. autoclass:: sagemaker.experiments.Experiment
+    :members:
+
+Other
+-------------
+
 .. autoclass:: sagemaker.experiments.SortByType
     :members:
     :undoc-members:
 
@@ -31,7 +31,7 @@ You can either deploy your model using DeepSpeed or HuggingFace Accelerate, or l
     djl_model = DJLModel(
         "s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
         "my_sagemaker_role",
-        data_type="fp16",
+        dtype="fp16",
         task="text-generation",
         number_of_partitions=2 # number of gpus to partition the model across
     )
@@ -48,7 +48,7 @@ If you want to use a specific backend, then you can create an instance of the co
     deepspeed_model = DeepSpeedModel(
         "s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
         "my_sagemaker_role",
-        data_type="bf16",
+        dtype="bf16",
         task="text-generation",
         tensor_parallel_degree=2, # number of gpus to partition the model across using tensor parallelism
     )
@@ -58,7 +58,7 @@ If you want to use a specific backend, then you can create an instance of the co
     hf_accelerate_model = HuggingFaceAccelerateModel(
         "s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
         "my_sagemaker_role",
-        data_type="fp16",
+        dtype="fp16",
         task="text-generation",
         number_of_partitions=2, # number of gpus to partition the model across
     )
@@ -109,7 +109,7 @@ For example, you can deploy the EleutherAI gpt-j-6B model like this:
     model = DJLModel(
         "EleutherAI/gpt-j-6B",
         "my_sagemaker_role",
-        data_type="fp16",
+        dtype="fp16",
         number_of_partitions=2
     )
 
@@ -142,7 +142,7 @@ You would then pass "s3://my_bucket/gpt-j-6B" as ``model_id`` to the ``DJLModel`
     model = DJLModel(
         "s3://my_bucket/gpt-j-6B",
         "my_sagemaker_role",
-        data_type="fp16",
+        dtype="fp16",
         number_of_partitions=2
     )
 
@@ -213,7 +213,7 @@ For more information about DJL Serving, see the `DJL Serving documentation. <htt
 SageMaker DJL Classes
 ***********************
 
-For information about the different DJL Serving related classes in the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/sagemaker.djl_inference.html.
+For information about the different DJL Serving related classes in the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/frameworks/djl/sagemaker.djl_inference.html.
 
 ********************************
 SageMaker DJL Serving Containers
 
@@ -892,7 +892,7 @@ see `For versions 1.1 and lower <#for-versions-1.1-and-lower>`_.
     |               |--inference.py
     |               |--requirements.txt
 
-Where ``requirments.txt`` is an optional file that specifies dependencies on third-party libraries.
+Where ``requirements.txt`` is an optional file that specifies dependencies on third-party libraries.
 
 Create a ``PyTorchModel`` object
 --------------------------------