Skip to content

Training a model in local_code mode does not work if source_dir="." #548

@tyrion

Description

@tyrion

System Information

  • Python Version: 3.6
  • Python SDK Version: master
  • Are you using a custom image: yes

Describe the problem

I am trying to train a model using the undocumented local_code mode. In case I don't specify source_dir or set it to "." the training procedure fails to mount the volumes correctly.

I get:

Cannot create container for service algo-1-JFP46: create .: volume name is too short, names should be at least two alphanumeric characters

I am reporting this even if local_code is still not documented, hoping it can be useful anyway.

Minimal repro / logs

Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

session = LocalSession()
session.config = {'local': {'local_code': True}}

est = MyEstimator(
    entry_point='code.py',
    train_instance_type='local',
    train_instance_count=1,
    role=role,
    sagemaker_session=session,
)

est.fit()

See the full traceback.

Here is the interesting part of the generated docker-compose.yaml:

networks:
  sagemaker-local:
    name: sagemaker-local
services:
    volumes:
    - /tmp/tmp_i5dhjtn/algo-1-JFP46/output/data:/opt/ml/output/data
    - /tmp/tmp_i5dhjtn/algo-1-JFP46/output:/opt/ml/output
    - /tmp/tmp_i5dhjtn/algo-1-JFP46/input:/opt/ml/input
    - /tmp/tmp_i5dhjtn/model:/opt/ml/model
    - :/opt/ml/code
    - /tmp/tmp_i5dhjtn/shared:/opt/ml/shared

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions