You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that the library does not support ``torch.compile`` in this release.
18
+
19
+
**New Features**
20
+
21
+
* Using sharded data parallelism with tensor parallelism together is now
22
+
available for PyTorch 1.13.1. It allows you to train with smaller global batch
23
+
sizes while scaling up to large clusters. For more information, see `Sharded
24
+
data parallelism with tensor parallelism <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html#model-parallel-extended-features-pytorch-sharded-data-parallelism-with-tensor-parallelism>`_
25
+
in the *Amazon SageMaker Developer Guide*.
26
+
* Added support for saving and loading full model checkpoints when using sharded
27
+
data parallelism. This is enabled by using the standard checkpointing API,
28
+
``smp.save_checkpoint`` with ``partial=False``.
29
+
Before, full checkpoints needed to be created by merging partial checkpoint
Copy file name to clipboardExpand all lines: doc/frameworks/djl/using_djl.rst
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,7 +31,7 @@ You can either deploy your model using DeepSpeed or HuggingFace Accelerate, or l
31
31
djl_model = DJLModel(
32
32
"s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
33
33
"my_sagemaker_role",
34
-
data_type="fp16",
34
+
dtype="fp16",
35
35
task="text-generation",
36
36
number_of_partitions=2# number of gpus to partition the model across
37
37
)
@@ -48,7 +48,7 @@ If you want to use a specific backend, then you can create an instance of the co
48
48
deepspeed_model = DeepSpeedModel(
49
49
"s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
50
50
"my_sagemaker_role",
51
-
data_type="bf16",
51
+
dtype="bf16",
52
52
task="text-generation",
53
53
tensor_parallel_degree=2, # number of gpus to partition the model across using tensor parallelism
54
54
)
@@ -58,7 +58,7 @@ If you want to use a specific backend, then you can create an instance of the co
58
58
hf_accelerate_model = HuggingFaceAccelerateModel(
59
59
"s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
60
60
"my_sagemaker_role",
61
-
data_type="fp16",
61
+
dtype="fp16",
62
62
task="text-generation",
63
63
number_of_partitions=2, # number of gpus to partition the model across
64
64
)
@@ -109,7 +109,7 @@ For example, you can deploy the EleutherAI gpt-j-6B model like this:
109
109
model = DJLModel(
110
110
"EleutherAI/gpt-j-6B",
111
111
"my_sagemaker_role",
112
-
data_type="fp16",
112
+
dtype="fp16",
113
113
number_of_partitions=2
114
114
)
115
115
@@ -142,7 +142,7 @@ You would then pass "s3://my_bucket/gpt-j-6B" as ``model_id`` to the ``DJLModel`
142
142
model = DJLModel(
143
143
"s3://my_bucket/gpt-j-6B",
144
144
"my_sagemaker_role",
145
-
data_type="fp16",
145
+
dtype="fp16",
146
146
number_of_partitions=2
147
147
)
148
148
@@ -213,7 +213,7 @@ For more information about DJL Serving, see the `DJL Serving documentation. <htt
213
213
SageMaker DJL Classes
214
214
***********************
215
215
216
-
For information about the different DJL Serving related classes in the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/sagemaker.djl_inference.html.
216
+
For information about the different DJL Serving related classes in the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/frameworks/djl/sagemaker.djl_inference.html.
0 commit comments