Skip to content

Cannot set custom image for PyTorch model from estimator #1344

@mattmcclean

Description

@mattmcclean

Describe the bug
There seems to be a bug in the code when attempting to set a custom image for the PyTorchModel object. The problem is in the PyTorch estimator object that sets the image for the PyTorchModel for inference to be the same as the training image name. This is problematic as there is a different image for inference vs training now. The logic for the create_model() method should check if the parameter image is passed in and use this vs

To reproduce
A clear, step-by-step set of instructions to reproduce the bug.

from sagemaker.pytorch import PyTorch 

hyperparameters = {'epochs': 8}

estimator = PyTorch(source_dir='container/oxford-pets',
                    entry_point='oxford-pets.py',
                    role=role,
                    train_instance_count=1,
                    train_instance_type='local_gpu',
                    framework_version='1.3.1',
                    hyperparameters=hyperparameters,
                    image_name='fastai2-oxford-pets-sm-example-training')

estimator.fit('file://' + str(path))

predictor = model.deploy(1, 'local', image='fastai2-oxford-pets-sm-example-inference')

The following code with produce an error as it will attempt to launch a local Docker container with the image name fastai2-oxford-pets-sm-example-training instead of fastai2-oxford-pets-sm-example-inference

Inspecting the PyTorch model object the value for the image is set incorrectly to fastai2-oxford-pets-sm-example-training instead of fastai2-oxford-pets-sm-example-inference.

The problematic line of code is found here. It should check if the param image is passed in before setting the image param on the model instead of assigning from the var image_name.

The only way around this is to create the model from the estimator and override the param image. An example is shown below:

model = estimator.create_model(role=role, 
                               entry_point='oxford-pets.py', 
                               source_dir='container/oxford-pets', 
                               image='fastai2-oxford-pets-sm-example-inference')

model.image = 'fastai2-oxford-pets-sm-example-inference'

predictor = model.deploy(1, 'local')

Expected behavior
A clear and concise description of what you expected to happen.

The SDK should launch a container from the image named fastai2-oxford-pets-sm-example-inference.

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 1.50.9.post0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
  • Framework version: 1.3.1
  • Python version: 3.6
  • CPU or GPU: Both
  • Custom Docker image (Y/N): Y

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions