-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
When using the locale mode, under windows, the Estimator.fi() is not working.
To reproduce
import sagemaker
from sagemaker.local import LocalSession
sagemaker_session = LocalSession()
sagemaker_session.config = {'local': {'local_code': True}}
tf_estimator = TensorFlow(
# env
py_version='py37',
framework_version='2.4',
# instance
instance_count=1,
instance_type='local',
# job
base_job_name='test',
source_dir=r"C:/Users/me/project/",
entry_point=r"C:/Users/me/project/train.py",
role=role,
sagemaker_session = sagemaker_session,
script_mode=True,
)
tf_estimator.fit(r"file://C:/Users/me/project/data", wait=False)
but the generated docker-compose.yaml file is not well written, in volumes entry, one can see:
- C:\Users\me\project\data:/opt/ml/input/data/training
- /Users/me/project/:/opt/ml/code
The last line is not well written. I tried several ways:
- source_dir="C:/Users/me/project/"
- source_dir="file://C:/Users/me/project/"
- source_dir="file:///C:/Users/me/project/"
- source_dir=r"file://C:/Users/me/project/"
- source_dir=r"C:/Users/me/project/"
But it doesn't work better.
EDIT
It seems the origin of the error come from sagemaker.local.image._prepare_training_volumes() and the use of the urllib.parse.urlparse() function.
training_dir
Out[2]: 'file://C:/Users/me/project/'
urlparse(training_dir)
Out[3]: ParseResult(scheme='file', netloc='C:', path='/Users/me/project/', params='', query='', fragment='')
And then, only the "path" key is written in the docker-compose.yaml file, which lead to an error because docker can't find /Users/me/project/ the C: is missing.
Any workaround in mind? I fixed it for my use case with:
volumes.append(_Volume(parsed_uri.netloc+parsed_uri.path, "/opt/ml/code")) instead of volumes.append(_Volume(parsed_uri.path, "/opt/ml/code")) but without knowing the side effect.
Expected behavior
To write correctly the docker-compose.yaml and to launch the training.
System information
A description of your system. Please provide:
- SageMaker Python SDK version: 2.59.1
- Framework name (eg. PyTorch) or algorithm (eg. KMeans): Tensorflow
- Framework version: 2.4
- Python version: 3.7
- CPU or GPU: CPU
- Custom Docker image (Y/N): N
Additional context
Docker image: 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training (2.4-cpu-py37)
Thank you for your help !