-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Description
Please fill out the form below.
Describe the problem
sagemaker.session.Session has an upload_data method that allows users to upload a local file or directory to S3:
sagemaker-python-sdk/src/sagemaker/session.py
Lines 137 to 165 in 6f3cf42
| def upload_data(self, path, bucket=None, key_prefix="data", extra_args=None): | |
| """Upload local file or directory to S3. | |
| If a single file is specified for upload, the resulting S3 object key is | |
| ``{key_prefix}/{filename}`` (filename does not include the local path, if any specified). | |
| If a directory is specified for upload, the API uploads all content, recursively, | |
| preserving relative structure of subdirectories. The resulting object key names are: | |
| ``{key_prefix}/{relative_subdirectory_path}/filename``. | |
| Args: | |
| path (str): Path (absolute or relative) of local file or directory to upload. | |
| bucket (str): Name of the S3 Bucket to upload to (default: None). If not specified, the | |
| default bucket of the ``Session`` is used (if default bucket does not exist, the | |
| ``Session`` creates it). | |
| key_prefix (str): Optional S3 object key name prefix (default: 'data'). S3 uses the | |
| prefix to create a directory structure for the bucket content that it display in | |
| the S3 console. | |
| extra_args (dict): Optional extra arguments that may be passed to the upload operation. | |
| Similar to ExtraArgs parameter in S3 upload_file function. Please refer to the | |
| ExtraArgs parameter documentation here: | |
| https://boto3.amazonaws.com/v1/documentation/api/latest/guide/s3-uploading-files.html#the-extraargs-parameter | |
| Returns: | |
| str: The S3 URI of the uploaded file(s). If a file is specified in the path argument, | |
| the URI format is: ``s3://{bucket name}/{key_prefix}/{original_file_name}``. | |
| If a directory is specified in the path argument, the URI format is | |
| ``s3://{bucket name}/{key_prefix}``. | |
| """ |
But there's no corresponding way to download files from using Session into a local directory. Training jobs put model artifacts in S3, and transform jobs put batch transform output in S3, and any job (or even an Endpoint) may output to S3 during its execution but right now, users have to use boto3 instead of sagemaker_session
Proposal
Add sagemaker.session.Session.download_data with the following signature and behavior:
def download_data(self, s3_uri, local_path):
"""Downloads data under an S3 prefix from an S3 URI into a local path or directory.
Args:
s3_uri (str): An S3 path. All objects under this prefix will be downloaded into the
current directory.
local_path (str): A local path. "." means the current directory. Directories will be
created as needed.
"""Thoughts / feedback on this proposal are welcome. Thanks!
rmwenzel