Skip to content

Commit 41d3de8

Browse files
Merge branch 'master' into async-backward-compatability
2 parents cd9469d + 8da92f7 commit 41d3de8

File tree

30 files changed

+925
-155
lines changed

30 files changed

+925
-155
lines changed

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,26 @@
11
# Changelog
22

3+
## v2.156.0 (2023-05-17)
4+
5+
### Features
6+
7+
* Partition support for DJLModel using SM Training job
8+
* Update run-notebook-test to consider skips failures
9+
10+
### Bug Fixes and Other Changes
11+
12+
* Update apache airflow and update test requirements
13+
* Perform integrity checks for remote function execution
14+
* Add p2 instances to integ tests
15+
* Fix typo in logging message within ir mixin
16+
* double Run create on load_run
17+
* Update dtype logic for huggingface backend for new containers
18+
19+
### Documentation Changes
20+
21+
* Update container version for SKLearn
22+
* Add description for parameters in TransformInput
23+
324
## v2.155.0 (2023-05-15)
425

526
### Features

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.155.1.dev0
1+
2.156.1.dev0

doc/frameworks/djl/using_djl.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,31 @@ see the `DJL Serving Documentation on Python Mode. <https://docs.djl.ai/docs/ser
221221

222222
For more information about DJL Serving, see the `DJL Serving documentation. <https://docs.djl.ai/docs/serving/index.html>`_
223223

224+
**************************
225+
Ahead of time partitioning
226+
**************************
227+
228+
To optimize the deployment of large models that do not fit in a single GPU, the model’s tensor weights are partitioned at
229+
runtime and each partition is loaded in individual GPU. But runtime partitioning takes significant amount of time and
230+
memory on model loading. So, DJLModel offers an ahead of time partitioning capability for DeepSpeed and FasterTransformer
231+
engines, which lets you partition your model weights and save them before deployment. HuggingFace does not support
232+
tensor parallelism, so ahead of time partitioning cannot be done for it. In our experiment with GPT-J model, loading
233+
this model with partitioned checkpoints increased the model loading time by 40%.
234+
235+
`partition` method invokes an Amazon SageMaker Training job to partition the model and upload those partitioned
236+
checkpoints to S3 bucket. You can either provide your desired S3 bucket to upload the partitioned checkpoints or it will be
237+
uploaded to the default SageMaker S3 bucket. Please note that this S3 bucket will be remembered for deployment. When you
238+
call `deploy` method after partition, DJLServing downloads the partitioned model checkpoints directly from the uploaded
239+
s3 url, if available.
240+
241+
.. code::
242+
243+
# partitions the model using Amazon Sagemaker Training Job.
244+
djl_model.partition("ml.g5.12xlarge")
245+
246+
predictor = deepspeed_model.deploy("ml.g5.12xlarge",
247+
initial_instance_count=1)
248+
224249
***********************
225250
SageMaker DJL Classes
226251
***********************

doc/frameworks/sklearn/using_sklearn.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ With Scikit-learn Estimators, you can train and host Scikit-learn models on Amaz
77
For information about supported versions of Scikit-learn, see the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/sklearn.html>`__.
88
We recommend that you use the latest supported version because that's where we focus most of our development efforts.
99

10-
For more information about the framework, see the `Sciket-Learn <https://github.com/scikit-learn/scikit-learn>`_ repository.
10+
For more information about the framework, see the `Scikit-Learn <https://github.com/scikit-learn/scikit-learn>`_ repository.
1111
For general information about using the SageMaker Python SDK, see :ref:`overview:Using the SageMaker Python SDK`.
1212

1313
.. contents::
@@ -31,7 +31,7 @@ To train a Scikit-learn model by using the SageMaker Python SDK:
3131
Prepare a Scikit-learn Training Script
3232
======================================
3333

34-
Your Scikit-learn training script must be a Python 3.6 compatible source file.
34+
Your Scikit-learn training script must be a Python 3.7 compatible source file.
3535

3636
The training script is similar to a training script you might run outside of SageMaker, but you
3737
can access useful properties about the training environment through various environment variables.
@@ -140,7 +140,7 @@ directories ('train' and 'test').
140140
141141
sklearn_estimator = SKLearn('sklearn-train.py',
142142
instance_type='ml.m4.xlarge',
143-
framework_version='0.20.0',
143+
framework_version='1.0-1',
144144
hyperparameters = {'epochs': 20, 'batch-size': 64, 'learning-rate': 0.1})
145145
sklearn_estimator.fit({'train': 's3://my-data-bucket/path/to/my/training/data',
146146
'test': 's3://my-data-bucket/path/to/my/test/data'})
@@ -204,7 +204,7 @@ operation.
204204
# Train my estimator
205205
sklearn_estimator = SKLearn(entry_point='train_and_deploy.py',
206206
instance_type='ml.m4.xlarge',
207-
framework_version='0.20.0')
207+
framework_version='1.0-1')
208208
sklearn_estimator.fit('s3://my_bucket/my_training_data/')
209209
210210
# Deploy my estimator to a SageMaker Endpoint and get a Predictor
@@ -478,7 +478,7 @@ The following code sample shows how to do this, using the ``SKLearnModel`` class
478478
sklearn_model = SKLearnModel(model_data="s3://bucket/model.tar.gz",
479479
role="SageMakerRole",
480480
entry_point="transform_script.py",
481-
framework_version="0.20.0")
481+
framework_version="1.0-1")
482482
483483
predictor = sklearn_model.deploy(instance_type="ml.c4.xlarge", initial_instance_count=1)
484484

requirements/extras/test_requirements.txt

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@ awslogs==0.14.0
1212
black==22.3.0
1313
stopit==1.1.2
1414
# Update tox.ini to have correct version of airflow constraints file
15-
apache-airflow==2.5.1
15+
apache-airflow==2.6.0
1616
apache-airflow-providers-amazon==7.2.1
17-
attrs==22.1.0
17+
attrs>=23.1.0,<24
1818
fabric==2.6.0
1919
requests==2.27.1
2020
sagemaker-experiments==0.1.35
@@ -23,3 +23,7 @@ pyvis==0.2.1
2323
pandas>=1.3.5,<1.5
2424
scikit-learn==1.0.2
2525
cloudpickle==2.2.1
26+
scipy==1.7.3
27+
urllib3==1.26.8
28+
docker>=5.0.2,<7.0.0
29+
PyYAML==6.0

setup.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ def read_requirements(filename):
4747

4848
# Declare minimal set for installation
4949
required_packages = [
50-
"attrs>=20.3.0,<23",
50+
"attrs>=23.1.0,<24",
5151
"boto3>=1.26.131,<2.0",
5252
"cloudpickle==2.2.1",
5353
"google-pasta",
@@ -60,7 +60,7 @@ def read_requirements(filename):
6060
"pandas",
6161
"pathos",
6262
"schema",
63-
"PyYAML==5.4.1",
63+
"PyYAML==6.0",
6464
"jsonschema",
6565
"platformdirs",
6666
"tblib==1.7.0",
@@ -75,7 +75,7 @@ def read_requirements(filename):
7575
# Meta dependency groups
7676
extras["all"] = [item for group in extras.values() for item in group]
7777
# Tests specific dependencies (do not need to be included in 'all')
78-
extras["test"] = (extras["all"] + read_requirements("requirements/extras/test_requirements.txt"),)
78+
extras["test"] = (read_requirements("requirements/extras/test_requirements.txt"),)
7979

8080
setup(
8181
name="sagemaker",

0 commit comments

Comments
 (0)