Skip to content

Bug: Cannot use spot instances with hyperparameter tuning job #1011

@adrian-chang

Description

@adrian-chang

Please fill out the form below.

System Information

  • Python Version:
    3.6.9
  • Python SDK Version:
    1.38.3

Describe the problem

Not possible to use spot instances with the hyperparameter tuning job.

Minimal repro / logs

https://github.com/aws/sagemaker-python-sdk/blob/9765de68ad8b776740d800148c861ca0e4794716/src/sagemaker/job.py

Doesn't appear at all to copy over the train_use_spot_instances attribute of an estimator yet even though train_max_wait is used when set. This is problematic as you cannot use spot instances for hyperparameter tuning even though you can for individual tuning to the point where it's an issue if train_max_wait is set.

File "/lib/python3.6/site-packages/sagemaker/tuner.py", line 362, in fit
self.latest_tuning_job = _TuningJob.start_new(self, inputs)
File "/lib/python3.6/site-packages/sagemaker/tuner.py", line 893, in start_new
tuner.estimator.sagemaker_session.tune(**tuner_args)
File "/lib/python3.6/site-packages/sagemaker/session.py", line 574, in tune
self.sagemaker_client.create_hyper_parameter_tuning_job(**tune_request)
File "/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreateHyperParameterTuningJob operation: Invalid MaxWaitTimeInSeconds. It is only supported when EnableManagedSpotTraining is set to true

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions